如何在一个序列中找到一个三个字母?

2024-10-03 02:34:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我的顺序如下:

my_file_m= "TCCATTCTCTACCCAGCCCCCACTCTGACCCCTTTACTCTGACCCCTTTATTGTCTACTCCTCAGAGCCCCCAGTCTGTA
TCCTTCTAACTTAGAAAGGGGATTATGGCTCAGGGTCCAACTCTGTGCTCAGAGCTTTCAACAACTACTCAGAAACACAA
GATGCTGGGACAGTGACCTGGACTGTGGGCCTCTCATGCACCACCATCAAGGACTCAAATGGGCTTTCCGAATTCACTGG
AGCCTCGAATGTCCATTCCTGAGTTCTGCAAAGGGAGAGTGGTCAGGTTGCCTCTGTCTCAGAATGAGGCTGGATAAGAT"

我想知道具体的三个字母是TAATGATAG的位置和数量。如果有的话,我想把它们涂上颜色。在

我从装信开始

^{pr2}$

我不能使用.count也不能使用find,因为我有三个输入。有没有办法找到并突出它们?在


Tags: 数量顺序颜色mytagcount字母file
3条回答

使用标准库中的^{}函数和^{}

import re
from collections import Counter

pat = re.compile(r"(TAA|TGA|TAG)")
c = re.findall(pat,my_file_m)

print(c)
print(Counter(c))

输出

^{pr2}$

你需要把DNA序列每三个字母分开来绘制遗传密码吗?在

如果是,请参阅以下代码。在

my_file_m= '''TCCATTCTCTACCCAGCCCCCACTCTGACCCCTTTACTCTGACCCCTTTATTGTCTACTCCTCAGAGCCCCCAGTCTGTA
TCCTTCTAACTTAGAAAGGGGATTATGGCTCAGGGTCCAACTCTGTGCTCAGAGCTTTCAACAACTACTCAGAAACACAA
GATGCTGGGACAGTGACCTGGACTGTGGGCCTCTCATGCACCACCATCAAGGACTCAAATGGGCTTTCCGAATTCACTGG
AGCCTCGAATGTCCATTCCTGAGTTCTGCAAAGGGAGAGTGGTCAGGTTGCCTCTGTCTCAGAATGAGGCTGGATAAGAT'''

mm = "".join(my_file_m.split())                 # delete the new line characters

messenger = map(''.join, zip(*[iter(mm)]*3))    # split every three letters

print messenger.count('TAA')
print messenger.count('TGA')
print messenger.count('TAG')

输出

^{pr2}$

以下是我对你问题的解答:

注意:这个代码也可以找到重叠的序列。根据是否允许重叠,您必须删除'?='

import re 

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

my_file_m= '''TTCCATTCTCTACCCAGCCCCCACTCTGACCCCTTTACTCTGACCCCTTTATTGTCTACTCCTCAGAGCCCCCAGTCTGTATCCTTCTAACTTAGAAAGGGGATTATGGCTCAGGGTCCAACTCTGTGCTCAGAGCTTTCAACAACTACTCAGAAACACAAGATGCTGGGACAGTGACCTGGACTGTGGGCCTCTCATGCACCACCATCAAGGACTCAAATGGGCTTTCCGAATTCACTGGAGCCTCGAATGTCCATTCCTGAGTTCTGCAAAGGGAGAGTGGTCAGGTTGCCTCTGTCTCAGAATGAGGCTGGATAAGAT'''


pat = re.compile(r'(?=(TAA|AAT|TGA|TAG))') # Very important, if you do not need overlaps then remove '?='
matches = re.finditer(pat,my_file_m)
result1 = [int(match.start(1)) for match in matches] # find all the starting positions of the string
result2 = [range(x,x+3) for x in result1 ] # find all the positions of the characters (given that we search for patterns of length 3, can be modified for other lengths too )
result3 = set().union(*result2) # generate a union

for chari in range(len(my_file_m)): # colorize based on if it is in a sequence or not
    if(chari in result3):
        print bcolors.OKGREEN + my_file_m[chari]  + bcolors.ENDC,
    else:
        print my_file_m[chari],

清洁剂:

^{pr2}$

贷方:herehere

输出: enter image description here

相关问题 更多 >