重新匹配与搜索性能差异问题的回答

重新匹配与搜索性能差异

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<blockquote> "So, the updated question is now why search is out-performing match?" </blockquote> 在这个使用文本字符串而不是regex模式的特定实例中，对于默认CPython实现，<code>re.search</code>确实比<code>re.match</code>稍快一些（我没有在Python的其他实例中测试过这一点）。在 <pre><code>>>> print timeit.timeit(stmt="r.match(s)", ... setup="import re; s = 'helloab'*100000; r = re.compile('hello')", ... number = 10000000) 3.29107403755 >>> print timeit.timeit(stmt="r.search(s)", ... setup="import re; s = 'helloab'*100000; r = re.compile('hello')", ... number = 10000000) 2.39184308052 </code></pre> 查看<a href="http://hg.python.org/cpython/file/0ee03c9b098f/Modules/_sre.c" rel="noreferrer">C code behind those modules</a>，搜索代码似乎有一个内置的优化<a href="http://hg.python.org/cpython/file/0ee03c9b098f/Modules/_sre.c#l1506" rel="noreferrer">to quickly match patterns prefixed with a string lateral</a>。在上面的例子中，整个模式是一个没有正则表达式模式的文本字符串，因此这个经过优化的路由用于匹配整个模式。在 请注意，一旦我们引入正则表达式符号，并且随着字符串前缀变短，性能将如何降低： ^{pr2}$ 对于包含regex模式的部分模式，<a href="http://hg.python.org/cpython/file/0ee03c9b098f/Modules/_sre.c#l776" rel="noreferrer">SRE_MATCH</a>用于确定匹配项。这与<code>re.match</code>后面的代码基本相同。在 注意，如果模式以regex模式而不是文本字符串开始，那么结果是如何接近的（与<code>re.match</code>稍微快一点）。在 <pre><code>>>> print timeit.timeit(stmt="r.match(s)", ... setup="import re; s = 'helloab'*100000; r = re.compile('.ello')", ... number = 10000000) 3.22782492638 >>> print timeit.timeit(stmt="r.search(s)", ... setup="import re; s = 'helloab'*100000; r = re.compile('.ello')", ... number = 10000000) 3.31773591042 </code></pre> <hr/> 换句话说，忽略<code>search</code>和<code>match</code>有不同的目的，只有当模式是文本字符串时，<code>re.search</code>比{<cd2>}快。在 当然，如果使用的是文本字符串，那么使用字符串操作可能会更好。在 <pre><code>>>> # Detecting exact matches >>> print timeit.timeit(stmt="s == r", ... setup="s = 'helloab'*100000; r = 'hello'", ... number = 10000000) 0.339027881622 >>> # Determine if string contains another string >>> print timeit.timeit(stmt="s in r", ... setup="s = 'helloab'*100000; r = 'hello'", ... number = 10000000) 0.479326963425 >>> # detecting prefix >>> print timeit.timeit(stmt="s.startswith(r)", ... setup="s = 'helloab'*100000; r = 'hello'", ... number = 10000000) 1.49393510818 >>> print timeit.timeit(stmt="s[:len(r)] == r", ... setup="s = 'helloab'*100000; r = 'hello'", ... number = 10000000) 1.21005606651 </code></pre>

重新匹配与搜索性能差异

1 个回答

相关Python问题