使用regex排除字符串实际上是如何工作的

2024-09-25 16:22:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我不清楚负正则表达式是如何工作的。我关注了几篇文章(post 1post 2,),我使用了它们的模式,它们工作正常,但它们的解释对我来说没有意义。我尝试了几个regex测试站点,比如regex101等,但是它们无法处理那些在Python中似乎可以正常工作的模式;2

我更喜欢的方法是regex对负逻辑的行为方式与对正逻辑的行为方式相同。然而,在我看来,一旦使用了否定逻辑,它就开始了一种全新的处理方式,很难遵循。我知道有解决办法,但我有兴趣了解它通过正则表达式

下面例子的目标:假设我有一个商品列表,我想从中得到一个不是变量except定义的“气体”的列表。换句话说,我需要一个产品的名单,不“包含”字“气体”在他们的名字

下面是一个帮助程序代码,可以尝试不同的想法:

import re
cmdty = ['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']
expect= cmdty[-3:] # i.e. ['Crude Oil', 'Brent', 'WTI']
print(f'Starting list: {cmdty}. Would like to get: {expect}')
def check (pattern,cmdty=cmdty, expect=expect, comment=""): 
    out = [c for c in cmdty if re.search(pattern,c)]
    good = "yes" if set(out) == set(expect) else "no"
    print(f'pattern={pattern:20}: worked: {good:>3}. output={out}. comment: {comment}')

对正则表达式的各种尝试使其工作:

check(pattern='(?i)(?=gas)',comment="This one works, but requires negating the results")
check(pattern='(?i)(?!gas)',comment="My hope was that this would work")
check('(?i)(?:!gas)',comment="")
check('(?i)\s(?!gas)',comment="strange outcome")
check('(?i).*(?!gas).*')
check('^(?i)(?!.*gas).*$', comment='works')
check('^(?i)((?!gas).)*$', comment='not sure this one works')
check('(?i)^.*(?!gas).*$',comment="I'd expect this one to work, but does not")
check('(?i)^(?!.*gas).*$', comment='works')
check('(?i)nat(?!gas)', comment='makes sense, but super odd')

初步清单和目标:

Starting list: ['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI'].

Would like to get: ['Crude Oil', 'Brent', 'WTI']

下面是使用各种尝试使其工作的输出结果。怎么看待这件事,所以说得通

pattern=(?i)(?=gas)         : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract']. comment: This one works, but requires negating the results
pattern=(?i)(?!gas)         : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: My hope was that this would work
pattern=(?i)(?:!gas)        : worked:  no. output=[]. comment: 
pattern=(?i)\s(?!gas)       : worked:  no. output=['Henry Hub Natural Gas Contract', 'Crude Oil']. comment: strange outcome
pattern=(?i).*(?!gas).*     : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: 
pattern=^(?i)(?!.*gas).*$   : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: works
pattern=^(?i)((?!gas).)*$   : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: not sure this one works
pattern=(?i)^.*(?!gas).*$   : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: I'd expect this one to work, but does not
pattern=(?i)^(?!.*gas).*$   : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: works
pattern=(?i)nat(?!gas)      : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract']. comment: makes sense, but super odd`

Tags: outputcheckcommentnaturalhubpatterncontractoil