Python正则表达式允许的最大重复次数是多少?

2024-05-21 07:41:47 发布

您现在位置:Python中文网/ 问答频道 /正文

在Python2.7和3中,以下内容起作用:

>>> re.search(r"a{1,9999}", 'aaa')
<_sre.SRE_Match object at 0x1f5d100>

但这会产生一个错误:

^{pr2}$

似乎允许的重复次数有一个上限。这是正则表达式规范的一部分,还是Python特有的限制?如果是特定于Python的,那么实际的数字有没有记录在某个地方,它在不同的实现中是否有所不同?在


Tags: re规范searchobjectmatch地方错误记录
1条回答
网友
1楼 · 发布于 2024-05-21 07:41:47

快速手动二进制搜索发现了答案,特别是65535:

>>> re.search(r"a{1,65535}", 'aaa')
<_sre.SRE_Match object at 0x2a9a68>
>>> 
>>> re.search(r"a{1,65536}", 'aaa')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile
    groupindex, indexgroup
OverflowError: regular expression code size limit exceeded

这是讨论here

The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.

以及

The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".


感谢以下评论的作者指出了一些问题:

  • CPython在^{}中实现了此限制。(@卢卡斯格拉夫)
  • ^{}中有一个常量MAXREPEAT,它保存了这个最大重复值:

    ^{pr2}$

    (@MarkkuK。和@hcwhsa)

相关问题 更多 >