Python正则表达式与http不匹配：//

"This is a sample. http://www.egg1.com and http://egg2.com. This regex will only match this egg1 and egg2 and not the others contained inside http:// " Match: egg1 egg2 Replaced: replaced1 replaced2

3条回答

网友

1楼 · 编辑于 2024-06-30 16:45:28

我能想到的一个解决方案是为HTTP url和您的模式形成一个组合模式，然后相应地过滤匹配项：

import re

t = "http://www.egg1.com http://egg2.com egg3 egg4"

p = re.compile('(http://\S+)|(egg\d)')
for url, egg in p.findall(t):
  if egg:
    print egg

印刷品：

egg3
egg4

更新：要将此习惯用法与re.sub()一起使用，只需提供一个筛选函数：

p = re.compile(r'(http://\S+)|(egg(\d+))')

def repl(match):
    if match.group(2):
        return 'spam{0}'.format(match.group(3))
    return match.group(0)

print p.sub(repl, t)

印刷品：

http://www.egg1.com http://egg2.com spam3 spam4

网友

2楼 · 编辑于 2024-06-30 16:45:28

您需要在模式前面加上一个否定的lookbehind断言：

(?<!http://)egg[0-9]

在这个正则表达式中，每当正则表达式引擎找到匹配的模式egg[0-9]时，它都会回过头来验证前面的模式是否不匹配。负lookbehind断言以(?<!开头，以)结尾。这些分隔符之间的所有内容不应位于以下模式之前，也不会包含在结果中。

如何在您的案例中使用它：

>>> regex = re.compile('(?<!http://)egg[0-9]')
>>> a = "Example: http://egg1.com egg2 http://egg3.com egg4foo"
>>> regex.findall(a)
['egg2', 'egg4']

网友

3楼 · 编辑于 2024-06-30 16:45:28

这不会捕获http://...：

(?:http://.*?\s+)|(egg1)

相关问题更多 >

编程相关推荐

热门问题

热门文章