擅长:python、mysql、java
<p>看看这个:</p>
<p><a href="https://www.kaggle.com/xiangma/orf-finder?scriptVersionId=6709465" rel="nofollow noreferrer">https://www.kaggle.com/xiangma/orf-finder?scriptVersionId=6709465</a></p>
<p>如上面的链接所示,有两种方法可以做到这一点:</p>
<p>请注意,我设置了超过1000bp的ORF长度限制,您可以根据需要进行调整。在</p>
<p>第一个:</p>
<pre><code>from Bio import SeqIO
records = SeqIO.parse('dna2.fasta', 'fasta')
for record in records:
for strand, seq in (1, record.seq), (-1, record.seq.reverse_complement()):
for frame in range(3):
length = 3 * ((len(seq)-frame) // 3)
for pro in seq[frame:frame+length].translate(table = 1).split("*")[:-1]:
if 'M' in pro:
orf = pro[pro.find('M'):]
pos = seq[frame:frame+length].translate(table=1).find(orf)*3 + frame +1
if len(orf)*3 +3 > 1300:
print("{}...{} - length {}, strand {}, frame {}, pos {}, name {}".format\
(orf[:3], orf[-3:], len(orf)*3+3, strand, frame, pos, record.id))
</code></pre>
<p>第二个使用正则表达式:</p>
^{pr2}$