Python/pandas根据另一列中的单词在列中添加单词

import pandas as pd import numpy as np bodyparts = ['lip ', 'lips ', 'foot ', 'feet ', 'heel ', 'heels ', 'hand ', 'hands '] df = pd.read_excel(file) for word in bodyparts : if word in df["Sentence"] : df["Type"] = df["Type"].replace(np.nan, "bodypart", regex = True)

3条回答

网友

1楼 · 编辑于 2024-09-27 07:24:43

您可以创建一个正则表达式来搜索单词边界，然后将其用作str.contains的参数，例如：

import pandas as pd 
import numpy as np
import re

bodyparts = ['lips?', 'foot', 'feet', 'heels?', 'hands?', 'legs?']
rx = re.compile('|'.join(r'\b{}\b'.format(el) for el in bodyparts))

df = pd.DataFrame({
    'Sentence': ['my hand', 'the fish', 'the rabbit leg', 'hand over', 'something', 'cabbage', 'slippage'],
    'Type': [np.nan] * 7
})

df.loc[df.Sentence.str.contains(rx), 'Type'] = 'bodypart'

给你：

^{pr2}$

网友

2楼 · 编辑于 2024-09-27 07:24:43

肮脏的解决方案是检查两个集合的交集。在

集合A是你身体部分的列表，集合B是句子中的单词集合

df['Sentence']\
     .apply(lambda x: 'bodypart' if set(x.split()) \
     .symmetric_difference(bodyparts) else None)

网友

3楼 · 编辑于 2024-09-27 07:24:43

最简单的方法是：

df.loc[df.Sentence.isin(bodyparts),'Type']='Bodypart'

在必须放弃bodyparts中的空间之前：

^{pr2}$

df.Sentence.isin(bodyparts)选择好的行，Type要设置的列。.loc是允许修改的索引器。在

相关问题更多 >

编程相关推荐

热门问题

热门文章