生物信息学:找到给定基因组串的基因

2024-06-02 11:51:09 发布

您现在位置:Python中文网/ 问答频道 /正文

生物学家用字母a、C、T和G的序列来模拟基因组。一个基因是一个基因组的一个亚基,它在三重态ATG之后开始,在三重态标签、TAA或TGA之前结束。此外,基因串的长度是3的倍数,并且该基因不包含任何三胞胎ATG、TAG、TAA和TGA。在

理想情况下:

Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT #Enter   
TTT
GGGCGT
-----------------
Enter a genome string: TGTGTGTATAT
No Genes Were Found

到目前为止,我已经:

^{pr2}$

我总是遇到错误

老实说,这对我来说真的不起作用——我想我已经用这些代码行走到了死胡同——一种新的方法可能会有所帮助。在

提前谢谢!在

我所犯的错误-

Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT
Traceback (most recent call last):
  File "D:\Python\Chapter 8\Bioinformatics.py", line 40, in <module>
    main()
  File "D:\Python\Chapter 8\Bioinformatics.py", line 38, in main
    print(findGene(geneinput))
  File "D:\Python\Chapter 8\Bioinformatics.py", line 25, in findGene
    final += (chr[i+i + 3] + "\n")
IndexError: string index out of range

就像我之前说的,我不确定我是否走上了正确的道路,用我当前的代码解决这个问题-任何带有伪代码的新想法都是值得赞赏的!在


Tags: 代码inpy基因组stringgenomeline基因
1条回答
网友
1楼 · 发布于 2024-06-02 11:51:09

这可以通过regular expression完成:

import re

pattern = re.compile(r'ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)')
pattern.findall('TTATGTTTTAAGGATGGGGCGTTAGTT')
pattern.findall('TGTGTGTATAT')

输出

^{pr2}$

解释摘自https://regex101.com/r/yI4tN9/3

"ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)"g
    ATG matches the characters ATG literally (case sensitive)
    1st Capturing group ((?:[ACTG]{3})+?)
        (?:[ACTG]{3})+? Non-capturing group
            Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
            [ACTG]{3} match a single character present in the list below
                Quantifier: {3} Exactly 3 times
                ACTG a single character in the list ACTG literally (case sensitive)
    (?:TAG|TAA|TGA) Non-capturing group
        1st Alternative: TAG
            TAG matches the characters TAG literally (case sensitive)
        2nd Alternative: TAA
            TAA matches the characters TAA literally (case sensitive)
        3rd Alternative: TGA
            TGA matches the characters TGA literally (case sensitive)
    g modifier: global. All matches (don't return on first match)

相关问题 更多 >