擅长:python、mysql、java
<p>看起来你的数据是一个包含蛋白质序列的FASTA文件。因此,您应该考虑安装<a href="http://biopython.org/wiki/Main_Page" rel="noreferrer">BioPython</a>,而不是使用正则表达式。这是一个专门用于生物信息学使用和研究的图书馆。在</p>
<blockquote>
<p>The goal of Biopython is to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and classes. Biopython features include parsers for various Bioinformatics file formats (BLAST, Clustalw, FASTA, Genbank,...), access to online services (NCBI, Expasy,...), interfaces to common and not-so-common programs (Clustalw, DSSP, MSMS...), a standard sequence class, various clustering modules, a KD tree data structure etc. and even documentation.</p>
</blockquote>
<p>使用BioPython,可以按照以下方式从FASTA文件中提取给定标识符的序列:</p>
<pre><code>from Bio import SeqIO
input_file = r'C:\path\to\proteins.fasta'
record_id = 'Px016979'
record_dict = SeqIO.to_dict(SeqIO.parse(input_file, 'fasta'))
record = record_dict[record_id]
sequence = str(record.seq)
print sequence
</code></pre>