我必须写一个脚本来翻译这个序列:
dict = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser",
"TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp",
"TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu",
"CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro",
"CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg",
"CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met",
"ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn",
"AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg",
"GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala",
"GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu",
"GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"}
seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
a=""
for y in range( 0, len ( seq)):
c=(seq[y:y+3])
#print(c)
for k, v in dict.items():
if seq[y:y+3] == k:
alle_amino = v[::3] #alle aminozuren op rijtje, a1.1 -a2.1- a.3.1-a1.2 enzo
print (v)
有了这个脚本,我可以从下一个三个框架中得到氨基酸,但是我怎样才能对这个进行排序,然后从相邻的第1个框架中得到所有的氨基酸,并且从第二个框架中得到所有的氨基酸,并且第三个框架的氨基酸都是一样的呢?在
例如,我的结果必须是:
+3 SerIleLeuAlaStpProLysTrpGluProProTyrValAlaStpProIleTyrIleTyrTle
+2 PheAsnThrSerMetThrLysValGlyThrProLeuArgSerMetThrHisIleTyrIleTyr
+1 PheGlnTyrStpHisAspGlnSerGlyAsnProLeuThrStpHisAspProTyrIleTyrIle
TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA
我使用python3。在
i had one more question : can i make this results by some changes in mine own script ?
这是我的解决方案。我把你的“dict”变量叫做“aminos”。函数
method3
返回“|”右侧的值列表。要将它们合并为一个字符串,只需将它们连接到“”上。在从你的代码来看,我相信你的aminos dict包含所有可能的三个字母组合。因此,我删除了验证这一点的检查。结果应该会快得多。在
不太漂亮,但你想怎么做就怎么做
您可以使用(请注意,使用biopython翻译方法会更容易得多):
注意我用
dicti
更改了你的字典名(不是为了覆盖dict
)。在一些有助于您理解的意见:
^{pr2}$translate
获取序列并以列表的形式返回它,其中每个项对应于编码该位置的三元组的氨基酸翻译。比如:您可以在
translate
中处理更多的这些数据(只得到一个或三个字母的代码),或者像我所做的那样返回它。在调用
translate
对于每个帧。对于帧值为0、1或2,它发送seq[frame:]作为要转换的参数。也就是说,你发送的序列对应于三个不同的读取帧,并对它们进行串行处理。然后,在
我把每种氨基酸的一个和三个字母代码分开,取索引1处的一个(第二个)。然后把它们连在一根绳子上
相关问题 更多 >
编程相关推荐