Pandas:统计单词的出现次数(来自另一个数据帧),并输出计数和匹配的单词

2024-04-19 03:15:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框(df),其中有一列包含句子。我有第二个数据帧(df2),其中有一列包含单词。在df中的每一行,我想计算一个来自df2的单词在句子中出现的次数,如果确实出现,则将计数输出到一个新列中,并将匹配的单词输出到一个新列中

我已经计算出了如何进行计数,但是我无法计算出如何输出匹配的单词-请参阅df_desiredoutput数据帧了解我想要的内容。提前谢谢

下面是一些伪代码

import pandas as pd
import re

df = pd.DataFrame({'sentence': ['Hello how are you', 'It is nice outside today', 'I need to water the plants', 'I need to cook dinner', 'See you tommorow']})
print(df)

df2 = pd.DataFrame({'words': ['hello', 'you', 'plants', 'need', 'tommorow']})
print(df2)

df["count"] = df["sentence"].str.count('|'.join(df2['words']), re.I)
print(df)

df_desiredoutput = pd.DataFrame({'sentence': ['Hello, how are you?', 'It is nice outside today', 'I need to water the plants', 'I need to cook dinner', 'See you tommorow'],
                          'count': ['2', '0', '2', '1', '2'],
                          'match': ['hello; you', '', 'need; plants', 'need', 'you; tomorrow']})
print(df_desiredoutput)

1条回答
网友
1楼 · 发布于 2024-04-19 03:15:54

^{}^{}一起使用:

pat = '|'.join(df2['words'])
df["count"] = df["sentence"].str.count(pat, re.I)
df["match"] = df["sentence"].str.findall(pat, re.I).str.join('; ')
print(df)
                     sentence  count          match
0           Hello how are you      2     Hello; you
1    It is nice outside today      0               
2  I need to water the plants      2   need; plants
3       I need to cook dinner      1           need
4            See you tommorow      2  you; tommorow

相关问题 更多 >