在文本块中查找子字符串,除非它是另一个子字符串的一部分

2024-09-28 01:32:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我在寻找一种在两个表达式之间找到子字符串的有效方法,除非该表达式是另一个表达式的一部分

例如:

Once upon a time, in a time far far away, dogs ruled the world. The End.

如果我在时间结束之间搜索子串,我将收到:

in a time far far away, dogs ruled the world. The

或者

far far away, dogs ruled the world. The

如果时间曾经是的一部分,我想忽略它。我不知道是否有一个pythonic方法不使用crazy for循环和if/else案例


Tags: the方法字符串inworldtime表达式时间
3条回答

只需删除“曾经”并检查剩下的内容

my_string = 'Once upon a time, in a time far far away, dogs ruled the world. The End.'
if 'time' in my_string.replace('Once upon a time', ''):
    pass

这在regex中是可能的,可以使用一个负的lookahead

>>> s = 'Once upon a time, in a time far far away, dogs ruled the world. The End.'
>>> pattern = r'time((?:(?!time).)*)End'
>>> re.findall(pattern, s)
[' far far away, dogs ruled the world. The ']

具有多个匹配项:

>>> s = 'a time b End time c time d End time'
>>> re.findall(pattern, s)
[' b ', ' d ']

这里的典型解决方案是使用捕获和非捕获正则表达式组。由于regex交替从左到右进行解析,因此将任何异常放在规则的第一位(作为非捕获),并以要为其选择的交替结束

import re

text = "Once upon a time, in a time far far away, dogs ruled the world. The End."
query = re.compile(r"""
  Once upon a time|            # literally 'Once upon a time',
                               # should not be selected
  time\b                       # from the word 'time'
  (.*)                         # capture everything
  \bend                        # until the word 'end'
""", re.X | re.I)

result = query.findall(text)
# result = ['', ' far far away, dogs ruled the world. The ']

您可以去掉空组(当我们匹配不需要的字符串时放入的)

result = list(filter(None, result))
# or result = [r for r in result if r]
# [' far far away, dogs ruled the world. The ']

然后去掉结果

result = list(map(str.strip, filter(None, result)))
# or result = [r.strip() for r in result if r]
# ['far far away, dogs ruled the world. The']

当你有很多要回避的短语时,这个解决方案特别有用

phrases = ["Once upon a time", "No time like the present", "Time to die", "All we have left is time"]
querystring = r"time\b(.*)\bend"
query = re.compile("|".join(map(re.escape, phrases)) + "|" + querystring, re.I)

result = [r.strip() for r in query.findall(some_text) if r]

相关问题 更多 >

    热门问题