<p>我不认为这样就可以得到下面的<code>O(n)</code>(因为您需要至少在字符串中迭代一次)。你可以做一些优化。在</p>
<p>我假设您想要匹配“<em>整个单词</em>”,例如查找<code>foo</code>应该如下匹配:</p>
<pre class="lang-none prettyprint-override"><code>foo and foo, or foobar and not foo.
^^^ ^^^ ^^^
</code></pre>
<p>因此,仅仅基于空间的夹板不会起作用,因为:</p>
^{pr2}$
<p>这就是<a href="https://docs.python.org/3.2/library/re.htm" rel="noreferrer">^{<cd3>} module</a>派上用场的地方,它将允许您构建引人入胜的条件。例如,regexp中的<code>\b</code>表示:</p>
<blockquote>
<p>Matches the empty string, but only at the beginning or end of a word. <em>A word is defined as a sequence of Unicode alphanumeric or underscore characters, so the end of a word is indicated by <strong>whitespace or a non-alphanumeric</strong></em>, non-underscore Unicode character. Note that formally, <code>\b</code> is defined as the boundary between a <code>\w</code> and a <code>\W</code> character (or vice versa), or between <code>\w</code> and the beginning/end of the string. This means that <code>r'\bfoo\b'</code> matches <code>'foo'</code>, <code>'foo.'</code>, <code>'(foo)'</code>, <code>'bar foo baz'</code> but not <code>'foobar'</code> or <code>'foo3'</code>.</p>
</blockquote>
<p>因此<code>r'\bfoo\b'</code>将只匹配<em>整个单词<code>foo</code></em>。也不要忘记使用<a href="https://docs.python.org/3.2/library/re.html#re.escape" rel="noreferrer">^{<cd7>}</a>:</p>
<pre><code>>>> re.escape('foo.bar+')
'foo\\.bar\\+'
>>> r'\b{}\b'.format(re.escape('foo.bar+'))
'\\bfoo\\.bar\\+\\b'
</code></pre>
<p>现在只需使用<a href="https://docs.python.org/3.2/library/re.html#re.finditer" rel="noreferrer">^{<cd8>}</a>扫描字符串。根据文件:</p>
<blockquote>
<p>Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.</p>
</blockquote>
<p>我假设匹配项是动态生成的,因此它们永远不必一次存储在内存中(这对于<strong>大的</strong>字符串和许多匹配项很有用)。最后数一数:</p>
<pre><code>>>> r = re.compile(r'\bfoo\b')
>>> it = r.finditer('foo and foo, or foobar and not foo.')
>>> sum(1 for _ in it)
3
</code></pre>