数一数单词，但忽略前面首字母大写的单词

data = {'text':["Kevin McDonald has bought a burger.", "The best burger in McDonald is cheeze buger."]} df = pd.DataFrame(data) long_list = ['McDonald', 'Five Guys'] # matching any of the words pattern = r'\b{}\b'.format('|'.join(long_list)) df['count'] = df.text.str.count(pattern)

2条回答

网友

1楼 · 编辑于 2024-09-30 06:11:15

您可以尝试以下模式：

pattern = r'\b[a-z].*?\b {}'.format('|'.join(long_list))

df['count'] = df.text.str.count(pattern)

网友

2楼 · 编辑于 2024-09-30 06:11:15

IIUC，目标是在前面有大写的单词时不匹配。检查之前是否有一个非大写的单词会消除许多合法的可能性

下面是一个正则表达式，它可以提供更多的可能性（句子开头，非单词之前）：

regex = '|'.join(fr'(?:\b[^A-Z]\S*\s+|[^\w\s] ?|^){i}' for i in long_list)
df['count'] = df['text'].str.count(regex)

例如：

                                           text  count
0           Kevin McDonald has bought a burger.      0
1  The best burger in McDonald is cheeze buger.      1
2                       McDonald's restaurants.      1
3                 Blah. McDonald's restaurants.      1

您可以测试并理解regexhere

相关问题更多 >

编程相关推荐

热门问题

热门文章

数一数单词，但忽略前面首字母大写的单词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >