我想在我的CD里找到起始密码子和终止密码子。我使用过正则表达式,但是当我运行这个python脚本时,我得到了一个空的stop密码子列表。非常感谢任何帮助。在
std_code = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser",
"TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp",
"TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu",
"CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro",
"CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg",
"CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met",
"ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn",
"AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg",
"GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala",
"GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu",
"GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"}
cds = ("ATGCTAGCGGTAAATCGTGAATAGGCCTAA")
for i in range (0, len(cds),3):
print cds[i:i+3]
def translate (cds, std_code):
protein = ""
for i in range (0,len(cds),3):
codon = cds[i:i+3]
protein = protein + std_code[codon]
return protein
print translate(cds, std_code)
def codon_usage(cds):
usage = {}
for i in range(0,len(cds),3):
codon = cds[i:i+3]
if usage.has_key(codon):
usage[codon] += 1
else:
usage[codon] = 1
return usage
print codon_usage(cds)
import re
pat = '(ATG)+?(?:TAA|TGA|TAG)'
reg = re.compile(pat)
def stop_codons(cds, messages=None,s=0,reg=reg):
stop_codons = []
while True:
ma = reg.search(cds[s:])
if ma:
if ma.group(1) == 'ATG':
break
else:
stop_codons.append(ma.group.upper())
s = s + ma.start() + 1
else:
break
return stop_codons
print stop_codons(cds, messages=None, s=0,reg=reg)
我看到你的
stop_codons
函数有几个问题。首先,您的正则表达式:正如它所写的,这是在寻找你的起始密码子的重复,然后是一个终止密码子。如果您的regex如它所写的那样匹配,它将只返回起始密码子,而不返回其他任何内容。如果您想在起始密码子之后找到第一个终止密码子,可以尝试:
^{pr2}$第二,如果正则表达式与起始密码子匹配,它将中断循环并返回
stop_codons
的列表,假设传入此函数的每个序列在停止之前都包含一个起始密码子,则始终会返回一个空列表。希望这有帮助。在你考虑过使用Biopython吗?它有各种处理序列的内置工具,如果你在做任何与生物信息学相关的事情,它可以为你节省很多时间。尤其是,^{} 对象正是您要查找的对象:
我强烈建议您阅读tutorial和API documentation来查找字母表、密码子表(
Bio.Data.CodonTable
)等信息相关问题 更多 >
编程相关推荐