使用复合词在字符串中计算正则表达式匹配的Python pandas数量

2024-09-28 21:48:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个正则表达式字典,我想计算字典中包含复合词的主题和正则表达式的匹配项。在

import pandas as pd


terms = {'animals':"(fox|russian brown deer|bald eagle|arctic fox)",
'people':'(John Adams|Rob|Steve|Superman|Super man)',
'games':'(basketball|basket ball|bball)'
}

df=pd.DataFrame({
'Score': [4,6,2,7,8],
'Foo': ['Superman was looking for a russian brown deer.', 'John adams started to play basket ball with rob yesterday before steve called him','Basketball or bball is a sport played by Steve afterschool','The bald eagle flew pass the arctic fox three times','The fox was sptted playing basket ball?']
})

为了计算匹配数,我可以使用类似于问题的代码:Python pandas count number of Regex matches in a string。但是它用空格分割字符串,然后计算不包含复合项的项。有什么替代方法可以让由空格连接的复合词包含在内?在

^{pr2}$

最终结果应该是:

                                                 Foo  Score  animals  people  \
0     Superman was looking for a russian brown deer.      4        1       1   
1  John adams started to play basket ball with ro...      6        0       3   
2  Basketball or bball is a sport played by Steve...      2        0       1   
3  The bald eagle flew pass the artic fox three t...      7        3       0   
4             The fox was sptted playing basket ball      8        1       0   

   games  
0      0  
1      1  
2      2  
3      0  
4      1  

请注意,对于第三行,北极狐中的“狐狸”一词和“北极狐”一词应分别计算一次(两次合计),作为动物列。在


Tags: the字典johnsteveeaglewasfoxbrown
1条回答
网友
1楼 · 发布于 2024-09-28 21:48:24

请看看这是否是您想要的:

import(re)
for k in terms.keys():
    df[k] = 0
    for words in re.sub("[()]","",terms[k]).split('|'):
        mask = df.Foo.str.contains(words, case = False)
        df[k] += mask
df


                                              Foo   Score   people  animals games
0   Superman was looking for a russian brown deer.      4        1        1     0
1   John adams started to play basket ball with ro...   6        3        0     1
2   Basketball or bball is a sport played by Steve...   2        1        0     2
3   The bald eagle flew pass the arctic fox three ...   7        0        3     0
4   The fox was sptted playing basket ball?             8        0        1     1

相关问题 更多 >