重新匹配与搜索性能差异

>>> s1 = ''' ... import re ... re.search(r'hello','helloab'*100000) ... ''' >>> timeit.timeit(stmt=s1,number=10000) 32.12064480781555 >>> s = ''' ... import re ... re.match(r'hello','helloab'*100000) ... ''' >>> timeit.timeit(stmt=s,number=10000) 30.9136700630188

2条回答

网友

1楼 · 编辑于 2024-10-02 02:34:17

"So, the updated question is now why search is out-performing match?"

在这个使用文本字符串而不是regex模式的特定实例中，对于默认CPython实现，re.search确实比re.match稍快一些（我没有在Python的其他实例中测试过这一点）。在

>>> print timeit.timeit(stmt="r.match(s)",
...              setup="import re; s = 'helloab'*100000; r = re.compile('hello')",
...              number = 10000000)
3.29107403755
>>> print timeit.timeit(stmt="r.search(s)",
...              setup="import re; s = 'helloab'*100000; r = re.compile('hello')",
...             number = 10000000)
2.39184308052

查看C code behind those modules，搜索代码似乎有一个内置的优化to quickly match patterns prefixed with a string lateral。在上面的例子中，整个模式是一个没有正则表达式模式的文本字符串，因此这个经过优化的路由用于匹配整个模式。在

请注意，一旦我们引入正则表达式符号，并且随着字符串前缀变短，性能将如何降低：

^{pr2}$

对于包含regex模式的部分模式，SRE_MATCH用于确定匹配项。这与re.match后面的代码基本相同。在

注意，如果模式以regex模式而不是文本字符串开始，那么结果是如何接近的（与re.match稍微快一点）。在

>>> print timeit.timeit(stmt="r.match(s)",
...              setup="import re; s = 'helloab'*100000; r = re.compile('.ello')",
...              number = 10000000)
3.22782492638
>>> print timeit.timeit(stmt="r.search(s)",
...              setup="import re; s = 'helloab'*100000; r = re.compile('.ello')",
...             number = 10000000)
3.31773591042

换句话说，忽略search和match有不同的目的，只有当模式是文本字符串时，re.search比{}快。在

当然，如果使用的是文本字符串，那么使用字符串操作可能会更好。在

>>> # Detecting exact matches
>>> print timeit.timeit(stmt="s == r", 
...              setup="s = 'helloab'*100000; r = 'hello'", 
...              number = 10000000)
0.339027881622

>>> # Determine if string contains another string
>>> print timeit.timeit(stmt="s in r", 
...              setup="s = 'helloab'*100000; r = 'hello'", 
...              number = 10000000)
0.479326963425


>>> # detecting prefix
>>> print timeit.timeit(stmt="s.startswith(r)",
...              setup="s = 'helloab'*100000; r = 'hello'",
...             number = 10000000)
1.49393510818
>>> print timeit.timeit(stmt="s[:len(r)] == r",
...              setup="s = 'helloab'*100000; r = 'hello'",
...             number = 10000000)
1.21005606651

网友

2楼 · 编辑于 2024-10-02 02:34:17

在我的机器（Mac OS 10.7.3上的Python 2.7.3，1.7 GHz Intel Core i5）上，当完成字符串构造、导入re和regex编译并执行10000000次迭代（而不是10次）后，我发现相反的情况：

import timeit

print timeit.timeit(stmt="r.match(s)",
             setup="import re; s = 'helloab'*100000; r = re.compile('hello')",
             number = 10000000)
# 6.43165612221
print timeit.timeit(stmt="r.search(s)",
             setup="import re; s = 'helloab'*100000; r = re.compile('hello')",
            number = 10000000)
# 3.85176897049

相关问题更多 >

编程相关推荐

热门问题

热门文章