Python2re.sub公司:在回溯时中止灾难性模式

2024-10-02 18:19:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用re.sub() 有一些复杂的模式(由代码创建)会导致回溯。在

在Python2.6中,有没有什么实际的方法可以中止re.sub(假设没有找到模式,或者引发一个错误)?在

示例(这当然是一个哑模式,但它是由复杂的文本处理引擎动态创建的):

>>>re.sub('[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*[i1l!|](?:[^i1l!|\\w]|[i1l!|])*[l1i!|](?:[^l1i!||\\w]|[l1i!|])*','*','ilililililililililililililililililililililililililililililililililil :x')


Tags: 方法代码引擎re示例错误模式动态创建
2条回答

{cd1>您可以在这里帮助您:

In [9]: re.sub ?
Type:       function
Base Class: <type 'function'>
String Form:<function sub at 0x00AC7CF0>
Namespace:  Interactive
File:       c:\python27\lib\re.py
Definition: re.sub(pattern, repl, string, count=0, flags=0)
Docstring:
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.  repl can be either a string or a callable;
if a string, backslash escapes in it are processed.  If it is
a callable, it's passed the match object and must return
a replacement string to be used.


In [13]: a = "bbbbbbb"

In [14]: x = re.sub('b', 'a', a, count=3)

In [15]: x
Out[15]: 'aaabbbb'

除了分析正则表达式的潜在灾难性回溯(外部正则表达式的一个难题)或使用不允许回溯的不同正则表达式引擎之外,我认为唯一的方法是使用这种性质的超时:

import re
import signal

class Timeout(Exception): 
    pass 

def try_one(pat,rep,s,t=3):
    def timeout_handler(signum, frame):
        raise Timeout()

    old_handler = signal.signal(signal.SIGALRM, timeout_handler) 
    signal.alarm(t) 

    try: 
        ret=re.sub(pat, rep, s)

    except Timeout:
        print('"{}" timed out after {} seconds'.format(pat,t))
        return None

    finally:
        signal.signal(signal.SIGALRM, old_handler) 

    signal.alarm(0)
    return ret

try_one(r'^(.+?)\1+$', r'\1' ,"a" * 1000000 + "b")

试图替换单个字符的大量重复(在本例中为一百万个a字符)是一个classic catastrophic regex failure。它将需要数万年才能完成(至少使用Python或Perl)。Awk是不同的)。在

尝试3秒后,它优雅地超时并打印:

^{pr2}$

相关问题 更多 >