擅长:python、mysql、java
<p>正如其他用户所指出的,regex并不是解决这个问题的最佳技术。您可以使用字典,然后删除重复项:</p>
<pre><code>from collections import defaultdict
d = defaultdict(list)
s = ["CAV-1 ATCTACTTCTATCG", "CAV-2 GCGCGTAGCTAGCT", "CAV-2 AAGCGCTCGTAAAA", "CAV-3 AAATATATATATCC"]
for name, sequence in [i.split() for i in s]:
d[name].append(sequence)
final_output = [' '.join([a, b[0]]) for a, b in d.items()]
</code></pre>
<p>输出:</p>
<pre><code>['CAV-1 ATCTACTTCTATCG', 'CAV-2 GCGCGTAGCTAGCT', 'CAV-3 AAATATATATATCC']
</code></pre>