如何阻止脚本超出范围?

2024-05-19 17:07:13 发布

您现在位置:Python中文网/ 问答频道 /正文

当我感到无聊并想练习python时,我想我应该写一个脚本,将一些遗传代码转换成氨基酸序列。它一次只看一个字母,当它看到一个特定的序列时,就开始将三个基因密码转换成它们的等效氨基酸,并将它们串在一起,直到它到达一个不编码氨基酸的三个基因密码。然后,脚本返回到开始转换的位置,并重新开始迭代代码,直到找到另一个开始序列

在某种程度上,脚本是有效的。我一开始使用while循环在开始序列后遍历遗传代码的三联体,但当它到达遗传代码的末尾时,它超出了范围:

#!/usr/bin/env python

import sys
import re

def main():
    translation = {'gca':'A', 'gcc':'A', 'gcg':'A', 'gct':'A', 'tgc':'C', 'tgt':'C', 'gac':'D', 'gat':'D', 'gaa':'E', 'gag':'E', 'ttc':'F', 'ttt':'F', 'gga':'G', 'ggg':'G', 'ggc':'G', 'ggt':'G', 'cac':'H', 'cat':'H', 'ata':'I', 'atc':'I', 'att':'I', 'aaa':'K', 'aag':'K', 'tta':'L', 'ttg':'L', 'cta':'L', 'ctc':'L', 'ctg':'L', 'ctt':'L', 'atg':'M', 'tgg':'W', 'tac':'Y', 'tat':'Y'}
    translation.update(dict.fromkeys(['aac', 'aat'], 'N'))
    translation.update(dict.fromkeys(['cca', 'ccc', 'ccg', 'cct'], 'P'))
    translation.update(dict.fromkeys(['caa', 'cag'], 'Q'))
    translation.update(dict.fromkeys(['aac', 'aat'], 'N'))
    translation.update(dict.fromkeys(['aga', 'agg', 'cga', 'cgc', 'cgg', 'cgt'], 'R'))
    translation.update(dict.fromkeys(['agc', 'agt', 'tca', 'tcc', 'tcg', 'tct'], 'S'))
    translation.update(dict.fromkeys(['aca', 'acc', 'acg', 'act'], 'T'))
    translation.update(dict.fromkeys(['gta', 'gtc', 'gtg', 'gtt'], 'V'))
    translation.update(dict.fromkeys(['taa', 'tga', 'tag'], 'STOP'))
    
    rna = ""
    f = open(sys.argv[1], 'rU') #gets the code from a file
    for line in f:
        trimmedline = re.sub(r'[^atcgu]','', line)
        rna = rna + trimmedline
        
    f.close()
    
    #This part of the code iterates through the rna string one letter at a time
    #At each letter it grabs the next two letters in the string and joins the three letters together -> codon
    #If the codon pattern matches one of two strings it grabs the corresponding value from the translation dict and adds this to the 'primary' string
    #It then triggers a while loop that moves through the rna string, pulling back triplets and looking them up in the translation dict
    #These values are also added to primary
    #When the while loop returns 'STOP', the while loop exits and the for loop begins the process again from the next letter in 'rna'
    #i.e. if the first 'a' in atgcaaca... triggered the while loop, the next letter would be t
    for base in range(len(rna) - 2):
        codon = rna[base] + rna[base + 1] + rna[base + 2]
        if (codon == 'aug' or codon == "atg"):
            print 'Start codon found at position ' + str(base)
            primary = translation[codon]
            reset = 0
            l = 1
            while reset == 0:
                newcodon = rna[base + (3 * l)] + rna[base + (3 * l) + 1] + rna[base + (3 * l) + 2]
                if translation[newcodon] == 'STOP':
                    reset = 1
                    print primary
                    print '------------'
                else:
                    primary = primary + translation[newcodon]
                    #print primary
                    l = l + 1            
            
if __name__ == '__main__':
    main()   

我不知道如何阻止脚本在基因序列的末尾运行。我尝试使用另一个for循环来代替while循环,但我一直得到一个错误:字符串索引必须是整数,而不是str

            for triplet in rna[base + 3:((len(rna)-base)-((len(rna)- base) % 3)): 3]:
                newcodon = rna[triplet] + rna[triplet + 1] + rna[triplet + 2]
#                newcodon = rna[base + (3 * l)] + rna[base + (3 * l) + 1] + rna[base + (3 * l) + 2]
                if translation[newcodon] == 'STOP':
                    reset = 1
                    print primary
                    print '------------'
                else:
                    primary = primary + translation[newcodon]
                    l = l + 1  

有人能帮我摆脱痛苦吗

如果您想要/需要一些样本数据,您可以使用:

29581 ttttccgttt acgatatata gtctactctt gtgcagaatg aattctcgta actacatagc
29641 acaagtagat gtagttaact ttaatctcac atagcaatct ttaatcagtg tgtaacatta
29701 gggaggactt gaaagagcca ccacattttc accgaggcca cgcggagtac gatcgagtgt
29761 acagtgaaca atgctaggga gagctgccta tatggaagag ccctaatgtg taaaattaat
29821 tttagtagtg ctatccccat gtgattttaa tagcttctta ggagaatgac aaaaaaaaaa
29881 aaaaaaaaaa aaaaaaaaaa aaa

(如果您感兴趣,请参阅SARS-CoV-2的一部分)


Tags: theinforbaseupdate序列translationdict
1条回答
网友
1楼 · 发布于 2024-05-19 17:07:13

继续递增basel,但不检查是否超过了rna字符串的长度。将while循环的条件更改为

while reset == 0 and len(rna) > (base + (3 * l) + 2): 

将防止脚本超出字符串的长度。(base + (3 * l) + 2是您试图从rna字符串添加的最大索引,因此使用它作为退出while循环的测试)

相关问题 更多 >