Python:检查单词列表中是否有任何单词与正则表达式模式列表中的任何模式匹配

2024-10-04 09:18:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我在一个.txt文件中有一长串单词和regular expression patterns,我这样读:

with open(fileName, "r") as f1:
    pattern_list = f1.read().split('\n')

为了举例说明,前七个是这样的:

^{pr2}$

我想知道何时将输入字符串中的单词与pattern\u列表中的任何单词/模式匹配。下面的在某种程度上是有效的,但我看到两个问题:

  1. 首先,这似乎是相当低效的重新编译()每次我检查一个新的字符串输入时,我的模式列表中的每一项。。。但当我试图储存重新编译列表中的(raw\u str)对象(为了能够重用已经编译的regex列表来实现类似if w in regex_compile_list:的功能,它不能正常工作。)
  2. 第二,它有时并不像我所期望的那样起作用——注意怎么做
    • 虐待与虐待相匹配
    • 虐待与虐待相匹配
    • 疼痛与疼痛相匹配

我做错了什么?我怎样才能更有效率?提前感谢你对一个无赖的耐心,并感谢你的任何洞察力!在

string_input = "People who have been abandoned or abused will often be afraid of adversarial, abusive, or aggressive behavior. They are aching to abandon the abuse and aggression."
for raw_str in pattern_list:
    pat = re.compile(raw_str)
    for w in string_input.split():
        if pat.match(w):
            print "matched:", raw_str, "with:", w
#matched: abandon* with: abandoned
#matched: abandon* with: abandon
#matched: abuse* with: abused
#matched: abuse* with: abusive,
#matched: abuse* with: abuse
#matched: abusi* with: abused
#matched: abusi* with: abusive,
#matched: abusi* with: abuse
#matched: ache* with: aching
#matched: aching with: aching
#matched: advers* with: adversarial,
#matched: afraid with: afraid
#matched: aggress* with: aggressive
#matched: aggress* with: aggression.

Tags: in列表rawwith单词listpatternstr
3条回答

为了匹配shell样式的通配符,您可以(ab)使用模块^{}

由于fnmatch主要是为文件名比较而设计的,因此测试将区分大小写,或者不取决于您的操作系统。所以您必须规范化文本和模式(这里,我使用lower()来实现这个目的)

>>> import fnmatch

>>> pattern_list = ['abandon*', 'abuse*', 'abusi*', 'aching', 'advers*', 'afraid', 'aggress*']
>>> string_input = "People who have been abandoned or abused will often be afraid of adversarial, abusive, or aggressive behavior. They are aching to abandon the abuse and aggression."


>>> for pattern in pattern_list:
...     l = fnmatch.filter(string_input.split(), pattern)
...     if l:
...             print pattern, "match", l

制作:

^{pr2}$

abandon*将匹配abandonnnnnnnnnnnnnnnnnnnnnnn,而不是{}。你想要吗

abandon.*

相反。在

如果*都在字符串的末尾,您可能需要执行类似以下操作:

for pat in pattern_list:
    for w in words:
        if pat[-1] == '*' and w.startswith(pat[:-1]) or w == pat:
            # Do stuff

相关问题 更多 >