擅长:python、mysql、java
<p>使用python regex模块可以指定不匹配的数量</p>
<pre><code>import regex #intended as a replacement for re
from Bio import SeqIO
import collections
d = collections.defaultdict(list)
motif = r'((atcttgttcaatggccgatc)(....)(gtcgacaatcaa)){e<4}' #e<4 = less than 4 errors
records = list(SeqIO.parse(open(infile), "fastq"))
for record in records:
seq = str(record.seq)
match = regex.search(motif, seq, regex.BESTMATCH)
barcode = match.group(3)
sequence = match.group(0)
d[barcode].append(sequence) # store as a dictionary key = barcode, value = list of sequences
for k, v in d.items():
print("barcode = %s" % (k))
for i in v:
print("sequence = %s" % (i))
</code></pre>
<p>使用捕获组时,第四组(3)将是条形码</p>