<p>您必须使用这样的捕获组</p>
<p>正则表达式:</strong><code>((CAV-\d\s)[AGCT]+)(?:\n\2[AGCT]+)*</code></p>
<p><strong>说明:</strong></p>
<ol>
<li><p><code>((CAV-\d\s)[AGCT]+)</code>检查您的模式并捕获整个匹配。在第二捕获组中捕获子匹配<code>CAV-\d\s</code></p></li>
<li><p><code>(?:\n\2[AGCT]+)*</code>检查有多个子模式<code>CAV-\d\s</code>在其中的出现</p></li>
<li><p>最后用第一个被捕获的组替换整个比赛,即你的第一个模式</p></li>
</ol>
<p><strong><a href="https://regex101.com/r/KLnzaK/1/" rel="nofollow noreferrer">Regex101 Demo</a></strong></p>
<p><strong>Python代码</strong>(在Python 3.5.2中测试)</p>
<pre><code>import re
# Open file having genetic code. Use your file path.
new1 = 'C:\\Users\\acer\\Desktop\\new1.txt'
# Create a new file for replaced data. Use your file path.
new2 = 'C:\\Users\\acer\\Desktop\\new2.txt'
fp1 = open( new1, 'r') # Opening original file in read mode
fp2 = open( new2, 'w') # Opening replaced data in write mode.
lines = fp1.readlines() # Reading data from original file.
lines = ''.join(lines) # Joined all lines as one line.
# Regex substitution on joined lines. Will repalce the duplicate data.
lines = re.sub(r'((CAV-\d+\s)[AGCT]+)(?:\n\2[AGCT]+)*', r'\1', lines)
#Writing replaced data to new file.
fp2.write(lines)
# Closing files.
fp1.close()
fp2.close()
</code></pre>