如何在python中删除数据帧中单词的精确匹配？

2条回答

网友

1楼 · 编辑于 2024-10-02 20:35:38

Python字符串替换将不起作用，但正则表达式模块将起作用。您需要在字符串中添加一些标记，以使正则表达式查找完整的单词。例如，您可能知道它是一个完整的单词，因为它后面跟一个句点.、逗号,、任何类型的空格\s、或一个尾行$\b是单词边界的正则表达式模式

import re
s1 = df['game'].str
for sw in stopWords:
    s1 = re.sub(r'{0}\b'.format(sw), '', s1)
df['game'].str = s1

（我从this other good answer偷了\b。）

保留旧代码以防您感兴趣。这一步还可以直接删除匹配单词后的空格、逗号或句点，这不是您所要求的，但可能很有用

import re
s1 = df['game'].str
for sw in stopWords:
    s1 = re.sub(r'{0}([.,\s]|$)'.format(sw), '', s1)
df['game'].str = s1

网友

2楼 · 编辑于 2024-10-02 20:35:38

只需使用DataFrame.replace（）即可

In [1]: import pandas as pd 
   ...: df = pd.DataFrame({'game': ['juegos blue', 'juego red', 'juegos yellow']}) 
   ...: stop_words = [r'juego\b', r'juegos\b'] 
   ...: df.replace(to_replace={'game': '|'.join(stop_words)}, value='', regex=True, inplace=True) 
   ...: df                                                                                                                                                    
Out[1]: 
      game
0     blue
1      red
2   yellow

In [2]: df = pd.DataFrame({'game': ['juegos blue', 'juego red', 'juegos yellow']}) 
   ...: stop_words = [r'juego\b'] 
   ...: df.replace(to_replace={'game': '|'.join(stop_words)}, value='', regex=True, inplace=True) 
   ...: df                                                                                                                                                    
Out[2]: 
            game
0    juegos blue
1            red
2  juegos yellow

假设stop'words'以单词边界\b结尾

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在python中删除数据帧中单词的精确匹配？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >