<p>我建议</p>
<pre><code>\s*\b(?=[a-zA-Z\d]*([a-zA-Z\d])\1{3}|\d+\b)[a-zA-Z\d]+
</code></pre>
<p>参见<a href="https://regex101.com/r/qA0aS0/1" rel="nofollow">regex demo</a></p>
<p>只有字母数字单词与此模式匹配:</p>
<ul>
<li><code>\s*</code>-零个或多个空白</li>
<li><code>\b</code>-单词边界</li>
<li><code>(?=[a-zA-Z\d]*([a-zA-Z\d])\1{3}|\d+\b)</code>-单词中必须至少有4个重复的连续字母或数字,或者整个单词只能由数字组成</li>
<li><code>[a-zA-Z\d]+</code>-有1+个字母或数字的单词。在</li>
</ul>
<p><a href="http://ideone.com/5OiEtS" rel="nofollow">Python demo:</a></p>
^{pr2}$
<p>请注意,<code>strip()</code>将删除字符串开头剩余的空白。在</p>
<p>R中的一个类似的解决方案是TRE regex:</p>
<pre><code>x <- c("df", "All aaaaaab the best 8965", "US issssss is 123 good ", "qqqq qwerty 1 poiks", "lkjh ggggqwe 1234 aqwe iphone5224s")
p <- " *\\b(?:[[:alnum:]]*([[:alnum:]])\\1{3}[[:alnum:]]*|[0-9]+)\\b"
gsub(p, "", x)
</code></pre>
<p>见<a href="http://ideone.com/k51nyu" rel="nofollow">demo</a></p>
<p><em>图案细节</em>和<a href="https://regex101.com/r/sL0wE8/1" rel="nofollow">demo</a>:</p>
<ul>
<li><code>\s*</code>-0+个空格</li>
<li><code>\b</code>-前导词边界</li>
<li><code>(?:[[:alnum:]]*([[:alnum:]])\1{3}[[:alnum:]]*|[0-9]+)</code>-两种选择之一:
<ul>
<li><code>[[:alnum:]]*([[:alnum:]])\1{3}[[:alnum:]]*</code>-0+个字母数字,后跟相同的4个字母数字字符,然后是0+字母数字字符</li>
<li><code>|</code>-或</li>
<li><code>[0-9]+</code>-1个或更多个数字</li>
</ul></li>
<li><code>\b</code>-尾随词边界</li>
</ul>
<p>更新:</p>
<p>若要同时添加一个选项以删除可能使用的单字母单词,请执行以下操作:</p>
<ol>
<li><strong>R</strong>(将<code>[[:alpha:]]|</code>添加到交替组):<code>\s*\b(?:[[:alpha:]]|[[:alnum:]]*([[:alnum:]])\1{3}[[:alnum:]]*|[0-9]+)\b</code>(参见<a href="https://regex101.com/r/sL0wE8/3" rel="nofollow">demo</a>)</li>
<li><strong>Python</strong>基于lookaround的regex(<a href="https://regex101.com/r/qA0aS0/2" rel="nofollow">add</a><code>[a-zA-Z]\b|</code>到lookahead组):<code>*\b(?=[a-zA-Z]\b|\d+\b|[a-zA-Z\d]*([a-zA-Z\d])\1{3})[a-zA-Z\d]+</code></li>
</ol>