如果子串包含re模块，如何提取所有原始化合物？

import re badwords = ['qanon', 'trump', 'corona', 'COVID'] string = "'Patriots', 'corona2020','COVID-19','coronavirus','2020TRUmp','Support2020Trump',Trump2020,'QAnon'" for each in badwords: print(re.findall ('[0-9a-zA-Z]+'+each,string,flags=re.IGNORECASE)+\ re.findall (each+'[0-9a-zA-Z]+',string,flags=re.IGNORECASE))

2条回答

网友

1楼 · 编辑于 2024-06-28 19:00:15

首先，创建一个与badwords列表中的任何单词匹配的正则表达式：

import re
rex_string = "(" + "|".join(badwords) + ")" # (qanon|trump|corona|COVID)

rex = re.compile(rex_string, re.IGNORECASE)

然后，split()您的string使用逗号，以获得每个元素包含一个复合词的列表。接下来，迭代这个列表，如果正则表达式与字符串不匹配，则将其添加到新的单词列表中。最后，我们可以使用str.join()将新的单词列表连接到单个字符串中

words_list = string.split(",")
new_list = []

for word in words_list:
    if rex.search(word) is None:
        # Didn't find a match
        new_list.append(word)

new_string = ",".join(new_list)

这给了我们字符串：

"'Patriots','whitehouse'"

如果您有这种倾向，可以将循环编写为一行：

new_list = [word for word in string.split(",") if rex.search(word) is None]

或者

new_string = ",".join(word for word in string.split(",") if rex.search(word) is None)

网友
2楼 · 编辑于 2024-06-28 19:00:15

我正在将每个单词转换为大写（或者可以是小写），这样就可以将每个相似的单词匹配到find上，而不会出现大写或小写的差异
badwords=['qanon','trump','coronavirus','COVID'] string= "'Patriots', 'corona2020','COVID-19','coronavirus','2020TRUmp','Support2020Trump','whitehouse','Trump2020','QAnon','QAnon2020',TrumpQanon" new_string = string.split(',') for bad_word in badwords: new_string = [every_word for every_word in new_string if every_word.upper().find(bad_word.upper()) == -1] string_without_bad_word = ','.join(new_string) #"'Patriots', 'corona2020','whitehouse'"

相关问题更多 >

编程相关推荐

热门问题

热门文章