我想确定牢房里是否有“麦当劳”这个词。然而,我希望忽略“McDonald”之前的单词有首个大写字母的情况,如“Kevin McDonald”。有没有建议如何在一个数据帧中通过正则表达式来实现它
data = {'text':["Kevin McDonald has bought a burger.",
"The best burger in McDonald is cheeze buger."]}
df = pd.DataFrame(data)
long_list = ['McDonald', 'Five Guys']
# matching any of the words
pattern = r'\b{}\b'.format('|'.join(long_list))
df['count'] = df.text.str.count(pattern)
text
0 Kevin McDonald has bought a burger.
1 The best burger in McDonald is cheeze buger.
预期产出:
text count
0 Kevin McDonald has bought a burger. 0
1 The best burger in McDonald is cheeze buger. 1
您可以尝试以下模式:
IIUC,目标是在前面有大写的单词时不匹配。检查之前是否有一个非大写的单词会消除许多合法的可能性
下面是一个正则表达式,它可以提供更多的可能性(句子开头,非单词之前):
例如:
您可以测试并理解regexhere
相关问题 更多 >
编程相关推荐