Python2re.sub公司：在回溯时中止灾难性模式

2条回答

网友

1楼 · 编辑于 2024-10-02 18:19:40

{cd1>您可以在这里帮助您：

In [9]: re.sub ?
Type:       function
Base Class: <type 'function'>
String Form:<function sub at 0x00AC7CF0>
Namespace:  Interactive
File:       c:\python27\lib\re.py
Definition: re.sub(pattern, repl, string, count=0, flags=0)
Docstring:
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.  repl can be either a string or a callable;
if a string, backslash escapes in it are processed.  If it is
a callable, it's passed the match object and must return
a replacement string to be used.


In [13]: a = "bbbbbbb"

In [14]: x = re.sub('b', 'a', a, count=3)

In [15]: x
Out[15]: 'aaabbbb'

网友

2楼 · 编辑于 2024-10-02 18:19:40

除了分析正则表达式的潜在灾难性回溯（外部正则表达式的一个难题）或使用不允许回溯的不同正则表达式引擎之外，我认为唯一的方法是使用这种性质的超时：

import re
import signal

class Timeout(Exception): 
    pass 

def try_one(pat,rep,s,t=3):
    def timeout_handler(signum, frame):
        raise Timeout()

    old_handler = signal.signal(signal.SIGALRM, timeout_handler) 
    signal.alarm(t) 

    try: 
        ret=re.sub(pat, rep, s)

    except Timeout:
        print('"{}" timed out after {} seconds'.format(pat,t))
        return None

    finally:
        signal.signal(signal.SIGALRM, old_handler) 

    signal.alarm(0)
    return ret

try_one(r'^(.+?)\1+$', r'\1' ,"a" * 1000000 + "b")

试图替换单个字符的大量重复（在本例中为一百万个a字符）是一个classic catastrophic regex failure。它将需要数万年才能完成（至少使用Python或Perl）。Awk是不同的）。在

尝试3秒后，它优雅地超时并打印：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python2re.sub公司：在回溯时中止灾难性模式

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >