模式意外结束：Python Regex

f=open("/Users/mymac/Desktop/regex.txt") s=f.read() s1 = re.sub(r'((?!http://|testing[0-9]|example[0-9]).*?)(CODE[0-9]{3})(?!</a>)', r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s) print s1

<a href="http://productcode/CODE123">CODE123</a> <a href="http://productcode/CODE765">CODE765</a> testing1<a href="http://productcode/CODE123">CODE123</a> example1<a href="http://productcode/CODE345">CODE345</a> http://www.coding.com/<a href="http://productcode/CODE333">CODE333</a> <a href="http://productcode/CODE345">CODE345</a> <a href="http://productcode/CODE234">CODE234</a> <a href="http://productcode/CODE333">CODE333</a>

3条回答

网友
1楼 · 编辑于 2024-09-27 07:31:43

我看到的唯一问题是你用错误的捕获组替换。
modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)(CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input) ^ ^ ^ first capturing group second one using the first group
在这里，我让第一个也是一个非捕获组
^(?i)(?:(?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)(CODE[0-9]{3})(?!</a>)
看here on Regexr

网友
2楼 · 编辑于 2024-09-27 07:31:43

好吧，看起来问题出在(?-i)上，这很令人惊讶。内联修饰符语法的目的是让您可以将修饰符应用到regex的选定部分。至少，它们在大多数口味中都是这样工作的。在Python中，它们似乎总是修改整个regex，就像外部标志一样（re.I，re.M，等等）。替代的(?i:xyz)语法也不起作用。
另一方面，我认为没有任何理由使用三个单独的lookaheads，正如您在这里所做的那样：
(?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?
或者他们在一起：
(?:(?!http://|testing[0-9]|example[0-9]).)*?
编辑：我们似乎已经从regex为什么抛出异常的问题转移到了它为什么不工作的问题。我不确定我是否理解您的要求，但是下面的regex和替换字符串返回您想要的结果。
s1 = re.sub(r'^((?!http://|testing[0-9]|example[0-9]).*?)(CODE[0-9]{3})(?!</a>)', r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s)
see it in action one ideone.com
这就是你想要的吗？
编辑：我们现在知道替换是在更大的文本中完成的，而不是在独立的字符串上。这使得问题变得更加困难，但是我们也知道完整的url（以http://开头的url）只出现在已经存在的锚元素中。这意味着我们可以将regex分成两个备选方案：一个用于匹配完整的<a>...</a>元素，另一个用于匹配目标字符串。
(?s)(?:(<a\s+[^>]*>.*?</a>)|\b((?:(?!testing[0-9]|example[0-9])\w)*?)(CODE[0-9]{3}))
诀窍是使用函数代替静态字符串进行替换。每当regex与锚定元素匹配时，函数将在组（1）中找到它，并将其原封不动地返回。否则，它使用组（2）和组（3）来构建新的组。
here's another demo（我知道这是可怕的代码，但我现在太累了，无法学习更像Python的方法。）

网友
3楼 · 编辑于 2024-09-27 07:31:43

你的主要问题是对Python 2.7和3.2来说是一厢情愿的想法。有关详细信息，请参见下文。

import re
# modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)
# (CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
# observation 1: as presented, pattern has a line break in the middle, just after (?-i)
# ob 2: rather hard to read, should use re.VERBOSE
# ob 3: not obvious whether it's a complile-time or run-time problem
# ob 4: (?i) should be at the very start of the pattern (see docs)
# ob 5: what on earth is (?-i) ... not in 2.7 docs, not in 3.2 docs
pattern = r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)(CODE[0-9]{3})(?!</a>)'
#### rx = re.compile(pattern)
# above line failed with "sre_constants.error: unexpected end of pattern"
# try without the (?-i)
pattern2 = r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(CODE[0-9]{3})(?!</a>)'
rx = re.compile(pattern2)
# This works, now you need to work on observations 1 to 4,
# and rethink your CODE/code strategy

看来建议置若罔闻。。。下面是re.VERBOSE格式的模式：

pattern4 = r'''
    ^
    (?i)
    (
        (?:
            (?!http://)
            (?!testing[0-9])
            (?!example[0-9])
            . #### what is this for?
        )*?
    ) ##### end of capturing group 1
    (CODE[0-9]{3}) #### not in capturing group 1
    (?!</a>)
    '''

相关问题更多 >

编程相关推荐

热门问题

热门文章