我有一个正则表达式字典,我想计算字典中包含复合词的主题和正则表达式的匹配项。在
import pandas as pd
terms = {'animals':"(fox|russian brown deer|bald eagle|arctic fox)",
'people':'(John Adams|Rob|Steve|Superman|Super man)',
'games':'(basketball|basket ball|bball)'
}
df=pd.DataFrame({
'Score': [4,6,2,7,8],
'Foo': ['Superman was looking for a russian brown deer.', 'John adams started to play basket ball with rob yesterday before steve called him','Basketball or bball is a sport played by Steve afterschool','The bald eagle flew pass the arctic fox three times','The fox was sptted playing basket ball?']
})
为了计算匹配数,我可以使用类似于问题的代码:Python pandas count number of Regex matches in a string。但是它用空格分割字符串,然后计算不包含复合项的项。有什么替代方法可以让由空格连接的复合词包含在内?在
^{pr2}$最终结果应该是:
Foo Score animals people \
0 Superman was looking for a russian brown deer. 4 1 1
1 John adams started to play basket ball with ro... 6 0 3
2 Basketball or bball is a sport played by Steve... 2 0 1
3 The bald eagle flew pass the artic fox three t... 7 3 0
4 The fox was sptted playing basket ball 8 1 0
games
0 0
1 1
2 2
3 0
4 1
请注意,对于第三行,北极狐中的“狐狸”一词和“北极狐”一词应分别计算一次(两次合计),作为动物列。在
请看看这是否是您想要的:
相关问题 更多 >
编程相关推荐