<p>这里有一个健壮的解决方案,它在数据库文件中搜索基因列表,以<code>fasta</code>格式打印结果,并列出未找到的基因。在</p>
<p>请注意,数据库中可能存在同一基因名的多个序列记录,因此您可能需要额外的筛选来获得您希望获得的序列。在</p>
<pre><code>from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
data = "embl.dat" #Path to EMBL database file
search = "gene_names.txt" #Path to file with search terms
#Load the search terms from file and strip linefeed characters
search_genes = open(search, 'r').read().splitlines()
found_genes = []
#Search the EMBL database file
for record in SeqIO.parse(open(data, 'r'), 'embl'):
UTR5 = [feature for feature in record.features if feature.type=="5'UTR"]
for utr5feature in UTR5:
for s in search_genes:
genes = utr5feature.qualifiers['gene']
if s in genes:
found_genes.append(s)
#Gene found. Print a modified copy of the record in the desired format
print SeqRecord(record.seq, id="_".join(genes), name=record.name,
description=record.description).format('fasta')
#List any search terms that were not found in the database
for s in search_genes:
if s not in found_genes:
print s+" NOT FOUND IN DATABASE!"
</code></pre>