如何找到序列中特定字母旁边的回文?

2024-09-29 21:30:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个脚本,它返回DNA序列中的回文子字符串

sequence="GATCTCTATACCAACTCAAAATGAAGACTCTTCTTTACACTTTCGAGCTCAGCAGGCTTACCGAGAAGAGTCGTCGTTCACATCCCCCCCTGTGCGAGATCAAGAAATTTGGCGACGTCGGCTTATTATCCTCCGCTGTCAATCAGTTGGACACATCTCTCCGGTCACTGCCGGACAAGCCAACCGAAGATTCGATTCTTCAGCAGCTTATCGACATTGCTGGTGGTGAAAAGCCAAGGCACAGCATCATAGTTGCGACCAATACGTCATACGACCGAGAGACATTGGTAAAGATCCTTCAACGATTCCCATACACCATACCTGGTCTGTCAGATTCAGGCTTGGAATCAGAAACACTCGAGGCTCTTGAGCACATCGCTTTTGCATTAGCCGGGCGATTAGCTCATAGATTTGACTACGGGTTCAATCCAGAGGCCAGTATCGTTCAACACCTCGAGATGTTCACCACCCTTTGGCACCAAAGATCTGCATTACCACCTGCGCCTGCCCCGTATCGACTTCCCGTTCCCGTCAATCAAGGAAGAGTCTCCTCATCAGATGATGGCTCTGATACTGAGTCAGAACTGGATGAAAAATACCACAACATCAAGAAGTCAGGACTTTGGAGGTTTCTGGATATGTTCAAAATGAACTTCAAGAGGTCTTAGATAACGGTCTAGTTCTAGTTCTGCAACTCACACTGA"
print(len(sequence))
pairs = {"A":"T", "T":"A", "G":"C", "C":"G"}
for i in range(len(sequence) - 6 + 1):
    pal = True
    for j in range(2):
        if pairs[ sequence[i+j] ] != sequence[i+5-j]:
            pal = False
            break
    if pal:
        print(sequence[i : i+6])

它返回:

704
GATCTC
GAGCTC
GCAGGC
GTTCAC
GAGATC
TCAAGA
AAATTT
GACGTC
CAGTTG
TGGACA
AAGATT
CTTCAG
CCAAGG
CGACCG
TTGGAA
CTCGAG
TCTTGA
CTTGAG
TGAGCA
CGGGCG
ATAGAT
ACGGGT
TCCAGA
CTCGAG
TCGAGA
TGTTCA
GTTCAC
GGCACC
AGATCT
CACCTG
GCCTGC
GACTTC
CAGATG
AGAACT
TCAAGA
GAAGTC
TCAGGA
AGGACT
TCTGGA
TGTTCA
TTCAAA
TCAAGA
GAGGTC
AGGTCT
TAGATA
AGTTCT
AGTTCT

我想知道这些子字符串是否位于“[ATCG]CC”或“[ATCG]GG”旁边 我想找出这些回文在序列中的位置(例如从第I个到第(I+5)个,因为回文的长度为6),然后检查第(I+6)个到第(I+8)个字母是[ATCG]CC还是[ATCG]GG。 你知道我怎么写这样的剧本吗?还是你有更好的逻辑? 多谢各位


Tags: 字符串inforlenifrange序列print
2条回答

只要加上一些额外的支票

sequence="GATCTCTATACCAACTCAAAATGAAGACTCTTCTTTACACTTTCGAGCTCAGCAGGCTTACCGAGAAGAGTCGTCGTTCACATCCCCCCCTGTGCGAGATCAAGAAATTTGGCGACGTCGGCTTATTATCCTCCGCTGTCAATCAGTTGGACACATCTCTCCGGTCACTGCCGGACAAGCCAACCGAAGATTCGATTCTTCAGCAGCTTATCGACATTGCTGGTGGTGAAAAGCCAAGGCACAGCATCATAGTTGCGACCAATACGTCATACGACCGAGAGACATTGGTAAAGATCCTTCAACGATTCCCATACACCATACCTGGTCTGTCAGATTCAGGCTTGGAATCAGAAACACTCGAGGCTCTTGAGCACATCGCTTTTGCATTAGCCGGGCGATTAGCTCATAGATTTGACTACGGGTTCAATCCAGAGGCCAGTATCGTTCAACACCTCGAGATGTTCACCACCCTTTGGCACCAAAGATCTGCATTACCACCTGCGCCTGCCCCGTATCGACTTCCCGTTCCCGTCAATCAAGGAAGAGTCTCCTCATCAGATGATGGCTCTGATACTGAGTCAGAACTGGATGAAAAATACCACAACATCAAGAAGTCAGGACTTTGGAGGTTTCTGGATATGTTCAAAATGAACTTCAAGAGGTCTTAGATAACGGTCTAGTTCTAGTTCTGCAACTCACACTGA"
print(len(sequence))
pairs = {"A":"T", "T":"A", "G":"C", "C":"G"}
ans = []
for i in range(len(sequence) - 9 + 1):
    pal = True
    for j in range(2):
        if pairs[ sequence[i+j] ] != sequence[i+5-j]:
            pal = False
            break
    if not pal:
        continue

    if (sequence[i+7] == sequence[i+8]) and (sequence[i+7] in ('C', 'G')):
        print(sequence[i : i+9])
        ans.append(sequence[i : i+9])
    else:
        print(sequence[i : i+6] + " (X)")
print("Count of answer: %d" % len(ans))

输出:

704
GATCTC (X)
GAGCTC (X)
GCAGGC (X)
GTTCAC (X)
GAGATC (X)
TCAAGA (X)
AAATTT (X)
GACGTC (X)
CAGTTG (X)
TGGACA (X)
AAGATT (X)
CTTCAG (X)
CCAAGG (X)
CGACCG (X)
TTGGAA (X)
CTCGAG (X)
TCTTGA (X)
CTTGAG (X)
TGAGCA (X)
CGGGCG (X)
ATAGAT (X)
ACGGGT (X)
TCCAGA (X)
CTCGAG (X)
TCGAGA (X)
TGTTCA (X)
GTTCAC (X)
GGCACC (X)
AGATCT (X)
CACCTG (X)
GCCTGCCCC
GACTTC (X)
CAGATG (X)
AGAACT (X)
TCAAGA (X)
GAAGTCAGG
TCAGGA (X)
AGGACT (X)
TCTGGA (X)
TGTTCA (X)
TTCAAA (X)
TCAAGA (X)
GAGGTC (X)
AGGTCT (X)
TAGATA (X)
AGTTCT (X)
AGTTCT (X)
Count of answer: 2

我不确定我是否能正确回答你的问题,但假设你得到的值是某种基因回文,然后你希望找到每个值的下两个值(如果我错了,请纠正我),简单的解决方案是这样的:

sequence="GATCTCTATACCAACTCAAAATGAAGACTCTTCTTTACACTTTCGAGCTCAGCAGGCTTACCGAGAAGAGTCGTCGTTCACATCCCCCCCTGTGCGAGATCAAGAAATTTGGCGACGTCGGCTTATTATCCTCCGCTGTCAATCAGTTGGACACATCTCTCCGGTCACTGCCGGACAAGCCAACCGAAGATTCGATTCTTCAGCAGCTTATCGACATTGCTGGTGGTGAAAAGCCAAGGCACAGCATCATAGTTGCGACCAATACGTCATACGACCGAGAGACATTGGTAAAGATCCTTCAACGATTCCCATACACCATACCTGGTCTGTCAGATTCAGGCTTGGAATCAGAAACACTCGAGGCTCTTGAGCACATCGCTTTTGCATTAGCCGGGCGATTAGCTCATAGATTTGACTACGGGTTCAATCCAGAGGCCAGTATCGTTCAACACCTCGAGATGTTCACCACCCTTTGGCACCAAAGATCTGCATTACCACCTGCGCCTGCCCCGTATCGACTTCCCGTTCCCGTCAATCAAGGAAGAGTCTCCTCATCAGATGATGGCTCTGATACTGAGTCAGAACTGGATGAAAAATACCACAACATCAAGAAGTCAGGACTTTGGAGGTTTCTGGATATGTTCAAAATGAACTTCAAGAGGTCTTAGATAACGGTCTAGTTCTAGTTCTGCAACTCACACTGA"

pairs = {"A":"T", "T":"A", "G":"C", "C":"G"}

keeper = []
for i in range(len(sequence) - 6 + 1):
    pal = True
    for j in range(2):
        if pairs[ sequence[i+j] ] != sequence[i+5-j]:
            pal = False
            break
    if pal:
        the_sequence = sequence[i : i+6]
#         print(the_sequence)
        keeper.append((the_sequence, (i, i+6)))
        
possible_ends = [a+'CC' for a in "ATCG"]
possible_ends.extend([a+'GG' for a in "ATCG"])

final = []

for val in keeper:
    temp = val[0]+sequence[val[1][1]:val[1][1]+3]
    
    temp_list = [temp.endswith(a) for a in possible_ends]
    
    if any(temp_list):
        final.append(temp)
    else:
        pass
    
print(final)

输出:

['GCCTGCCCC', 'GAAGTCAGG']

我希望并相信这是理想的产出

相关问题 更多 >

    热门问题