如果子串包含re模块,如何提取所有原始化合物?

2024-06-28 19:00:15 发布

您现在位置:Python中文网/ 问答频道 /正文

string= "'Patriots', 'corona2020','COVID-19','coronavirus','2020TRUmp','Support2020Trump','whitehouse','Trump2020','QAnon','QAnon2020',TrumpQanon"

badwords = ['qanon', 'trump', 'corona', 'COVID']

如果string中的化合物包含badwords的子字符串,则必须从该字符串中删除该化合物。例如,我们在badwords中有COVID,那么COVID-19应该在string中删除

我试图像这样使用re模块,但失败了:

import re

badwords = ['qanon', 'trump', 'corona', 'COVID']
string = "'Patriots', 'corona2020','COVID-19','coronavirus','2020TRUmp','Support2020Trump',Trump2020,'QAnon'"
for each in badwords:
    print(re.findall ('[0-9a-zA-Z]+'+each,string,flags=re.IGNORECASE)+\
          re.findall (each+'[0-9a-zA-Z]+',string,flags=re.IGNORECASE))

我想要的是:一个新字符串"'Patriots','whitehouse'"应该返回


Tags: 字符串restringeachtrumppatriotscovidcoronavirus
2条回答

首先,创建一个与badwords列表中的任何单词匹配的正则表达式:

import re
rex_string = "(" + "|".join(badwords) + ")" # (qanon|trump|corona|COVID)

rex = re.compile(rex_string, re.IGNORECASE)

然后,split()您的string使用逗号,以获得每个元素包含一个复合词的列表。 接下来,迭代这个列表,如果正则表达式与字符串不匹配,则将其添加到新的单词列表中。 最后,我们可以使用str.join()将新的单词列表连接到单个字符串中

words_list = string.split(",")
new_list = []

for word in words_list:
    if rex.search(word) is None:
        # Didn't find a match
        new_list.append(word)

new_string = ",".join(new_list)

这给了我们字符串:

"'Patriots','whitehouse'"

如果您有这种倾向,可以将循环编写为一行:

new_list = [word for word in string.split(",") if rex.search(word) is None]

或者

new_string = ",".join(word for word in string.split(",") if rex.search(word) is None)

我正在将每个单词转换为大写(或者可以是小写),这样就可以将每个相似的单词匹配到find上,而不会出现大写或小写的差异

badwords=['qanon','trump','coronavirus','COVID']
string= "'Patriots', 'corona2020','COVID-19','coronavirus','2020TRUmp','Support2020Trump','whitehouse','Trump2020','QAnon','QAnon2020',TrumpQanon"
new_string = string.split(',')

for bad_word in badwords:
 new_string = [every_word for every_word in new_string if every_word.upper().find(bad_word.upper()) == -1]
     
 
string_without_bad_word = ','.join(new_string) #"'Patriots', 'corona2020','whitehouse'"

相关问题 更多 >