<p>使用双向方法:拆分并分析单词:</p>
<pre><code>import re
strings = ["3n3k game gnma34 xbox360 table", "the a22b b3kj3 ps4 2ij2aln potato"]
exceptions = ['xbox360', 'ps4']
def cleanse(word):
rx = re.compile(r'\D*\d')
if rx.match(word) and word not in exceptions:
return ''
return word
nstrings = [" ".join(filter(None, (
cleanse(word) for word in string.split())))
for string in strings]
print(nstrings)
# ['game xbox360 table', 'the ps4 potato']
</code></pre>
<p/><hr/>
另外,我将正则表达式改为
^{pr2}$
<p>并尝试在每个“单词”的开头(用<code>re.match()</code>)匹配它们,因为<code>\w</code>也包含数字。在</p>
<p/><hr/>
如果能够升级到<a href="https://pypi.python.org/pypi/regex/" rel="nofollow noreferrer"><strong>newer ^{<cd3>} module</strong></a>,则可以使用<code>(*SKIP)(*FAIL)</code>和更好的表达式,而不需要函数:
<pre><code>\b(?:xbox360|ps4)\b # define your exceptions
(*SKIP)(*FAIL) # these shall fail
| # or match words with digits
\b[A-Za-z]*\d\w*\b
</code></pre>
<p>请参见<a href="https://regex101.com/r/t2nFHG/2/" rel="nofollow noreferrer"><strong>a demo on regex101.com</strong></a>和完整的<code>Python</code>片段:</p>
<pre><code>import regex as re
strings = ["3n3k game gnma34 xbox360 table", "the a22b b3kj3 ps4 2ij2aln potato 123123 1234"]
exceptions = [r'\d+', 'xbox360', 'ps4']
rx = re.compile(r'\b(?:{})\b(*SKIP)(*FAIL)|\b[A-Za-z]*\d\w*\b'.format("|".join(exceptions)))
nstrings = [" ".join(
filter(None, (rx.sub('', word)
for word in string.split())))
for string in strings]
print(nstrings)
# ['game xbox360 table', 'the ps4 potato 123123 1234']
</code></pre>