用于排除某些单词,同时匹配其他单词的正则表达式

2024-06-26 00:19:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在python(re模块中使用正则表达式匹配以下内容:

"...milk..."              => matched ['milk']

"...almondmilk..." = no match
"...almond milk..." = no match
"...almond word(s) milk..." => matched ['milk']
"...almondword(s)milk..." => matched ['milk']


"...soymilk..." = no match
"...soy milk..." = no match
"...soy word(s) milk..." => matched ['milk']
"...soyword(s)milk..." => matched ['milk']

我的另一个要求是查找给定字符串中的所有匹配项。所以我使用re.findall()

我使用这个question的答案(并查看了许多其他SO页面)来构造我的正则表达式:

regx = '^(?!.*(soy|almond))(?=$|.*(milk)).*'

但当我用一个简单的例子测试它时,我得到了不正确的行为:

>>> food = "is combined with creamy soy and milk. a fruity and refreshing sip of spring, "
>>> re.findall(regx, food)
[]
>>> food = "is combined with creamy milk. a fruity and refreshing sip of spring, "
>>> re.findall(regx, food)
[('', 'milk')]

这两个都应该返回['milk']。此外,如果我有多个milk实例,我只会得到一个结果,而不是两个:

>>> food = "is combined with creamy milk. a fruity and refreshing sip of milk, "
>>> re.findall(regx, food)
[('', 'milk')]

我在正则表达式中做错了什么,我应该如何调整它来解决这个问题


Tags: andnorefoodismatchwithcombined
2条回答

This regex对我有用

(?:soy|almond)\s?[\w\(\)]+\s?(milk)

或不接受以下词语中的括号:

(?:soy|almond)\s?\w+\s?(milk)

在Python中,应该是这样的:

import re

matches = re.findall(r'(?:soy|almond)\s?[\w\(\)]+\s?(milk)', your_text)

您可以通过匹配来排除soymilk{}杏仁奶and杏仁奶`并在捕获组中仅捕获牛奶,该组将由re.findall返回

\b(?:soy|almond)\s?milk\b|\b(milk)\b

模式匹配:

  • \b防止部分匹配的单词边界
  • (?:soy|almond)匹配大豆或杏仁
  • \s?milk\b匹配可选的空格char和milk,后跟单词边界
  • |
  • \b(milk)\b组1中捕获被单词边界包围的牛奶

您还可以使用[^\S\r\n]而不是\s来匹配没有换行符的空格,因为后者可以匹配换行符

Regex demoPython demo

比如说

import re

regx = r"\b(?:soy|almond)\s?milk\b|\b(milk)\b"

food = "is combined with creamy soy and milk. a fruity and refreshing sip of spring, "
print(re.findall(regx, food))

food = "is combined with creamy milk. a fruity and refreshing sip of spring, "
print(re.findall(regx, food))

输出

['milk']
['milk']

另一种选择是使用PyPi regex module

(?<!\b(?:soy|almond)\s*(?:milk)?)\bmilk\b

模式匹配:

  • (?<!负向后看,断言直接在左边的不是
  • \b(?:soy|almond)单词边界,匹配大豆或杏仁
  • \s*(?:milk)?匹配可选的空格字符,然后选择milk
  • )近距离观察
  • \bmilk\b在单词边界之间匹配milk

Regex demoPython demo

相关问题 更多 >