<p>编辑:改进答案</p>
<p>这个问题的原因在于“正则表达式正向查找后面”的执行方式</p>
<p>在正常情况下,从位置1匹配字符串后:</p>
<pre><code>patern = r"[G]{3,6}[ACTG]{1,33}[G]{3,6}[ACTG]{1,33}[G]{3,6}[ACTG]{1,33}[G]{3,6}"
GGGGAGAAGGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG
^
</code></pre>
<p>regex提前一个职位:</p>
<pre><code>GGGGAGAAGGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG
^
</code></pre>
<p>然后从那里开始匹配。<br/>
那么,这将不匹配
GGGGAGGGGGGGCCTTCCTGGGTCCCGAGGAGTGCAGCAGAGAGACATGCTGGG(因为它与第一场比赛在同一位置开始)</p>
<p>而且它不会匹配
GGGAGAGGGGGGCCTTCCTGGGTCCCGAGAGTGCAGCAGACACACACATGCCTGGG
因为它与第二场比赛在同一位置开始)</p>
<p>使用正面向后看,也可以从同一起始位置(重新)匹配。<br/>
正则表达式:</p>
<pre class="lang-regex prettyprint-override"><code>(?<=([paterntomatch]))
// ?<= indicates positive look behind
</code></pre>
<p><strong>警告:下面的示例可能会产生双重结果(两次查找一个事件)</strong></p>
<p>因此,我在python中尝试使用递归函数来重新匹配match.substring(0,match.Length-1)</p>
<pre><code>
import re
def regexRecursive(patern,subject):
master_results = []
results = re.findall(patern, subject)
for result in results:
master_results.append(result)
length = (len(result)-1)
subresults = regexRecursive(patern,result[0:length])
for subresult in subresults:
master_results.append(subresult)
return master_results
patern = r"(?=([G]{3,6}[ACTG]{1,33}[G]{3,6}[ACTG]{1,33}[G]{3,6}[ACTG]{1,33}[G]{3,6}))"
given = "GGGGAGAAGGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG"
expected_yield = ['GGGGAGAAGGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG',
'GGGAGAAGGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG',
'GGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG',
'GGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG',
'GGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG',
'GGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGGTCCACAGCCACGGTTTGGG',
'GGGGAGAAGGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGG',
'GGGAGAAGGGGGGCCTTCCTGGGTCCCCGAGAGTGCAGACATGCCTGGG']
results = regexRecursive(patern, given)
for result in results:
if result not in expected_yield:
print ("\033[91m","Found Unexpected: ",result,"\033[0m")
for expected in expected_yield:
if expected not in results:
print ("\033[91m","Missing Expected: ",expected,"\033[0m")
else:
print ("\033[92m","Found Expected: ",expected,"\033[0m")
</code></pre>
<p>它产生了8个子串的完整补码</p>