目的是在给定完整的mRNA序列和氨基酸序列的情况下,得到mRNA的编码序列。然后把这些都用密码子的形式。 我觉得我已经找到了可能的密码子列表。我只是不知道如何系统地匹配给定的mRNA序列。这是我到目前为止的情况。你知道吗
xAA = 'MDFFASGLPLVTEETPSGEAGSEEDDEVVAMIKELLDTRIRPTVQEDGGDVIYKGFEDGIVQLKLQGSCTSCPSSIITLKNGIQNMLQFYIPEVEGVEQVMDDESDEKEANSP'
xmRNA = 'GUUCCCCGGCCUCUCUUGGUCAGGGUGACGCAGUAGCCUGCAAACCUCGGCGCGUAGGCCACCGCACUUAUCCGCAGCAGGACCGCCCGCAGCCGGUAGGGUGGGCUCUUCCCAGUGCCCGCCCAGCUACCGGCCAGCCUGCGGCUGCGCAGAUCUUUCGUGGUUCUGUCAGGGAGACCCUUAGGCACUCCGGACUAAGAUGGCGGCGACGGCCAGGCGGGGCUGGGGAGCUGCGGCUGUUGCCGCCGGGCUGCGCAGGCGGUUCUGUCAUAUGUUGAAGAAUCCAUACACCAUUAAGAAACAGCCUCUGCAUCAGUUUGUACAAAGACCACUUUUCCCACUACCUGCAGCCUUUUAUCACCCAGGCAGUUAUUUAGGAUUGAAGGAGUAAAAAGUGUCUUCUUUGGACCAGAUUUCAUCACUGUCACAAAGGAAAAUGAAGAAUUAGACUGGAAUUUACUGAAACCAGAUAUUUAUGCAACAAUCAUGGACUUCUUUGCAUCUGGCUUACCCCUGGUUACUGAGGAAACACCUUCAGGAGAAGCAGGAUCUGAAGAAGAUGAUGAAGUUGUGGCAAUGAUUAAGGAAUUGUUAGAUACUAGAAUACGGCCAACUGUGCAGGAAGAUGGAGGGGAUGUAAUCUACAAAGGCUUUGAAGAUGGCAUUGUACAGCUGAAACUCCAGGGUUCUUGUACCAGCUGCCCUAGUUCAAUCAUUACUCUGAAAAAUGGAAUUCAGAACAUGCUGCAGUUUUAUAUUCCGGAGGUAGAAGGCGUAGAACAGGUUAUGGAUGAUGAAUCAGAUGAAAAAGAAGCAAACUCACCUUAAAAUAAUCUGGAUUUUCUUUGGGCAUAACAGUCAGACUUGUUGAUAAUAUAUAUCAAGUUUUUAUUAUUAAUAUGCUGAGGAACUUGAAGAUUAAUAAAAUAUGCUCUUCAGAGAAUGAUAUAUAAAA'
d = {'mRNA': ['UUU','UUC','UUA','UUG','UCU','UCC','UCA','UCG','UAU','UAC','UAA','UAG','UGU','UGC','UGA','UGG','CUU','CUC','CUA','CUG','CCU','CCC','CCA','CCG','CAU','CAC','CAA','CAG','CGU','CGC','CGA','CGG','AUU','AUC','AUA','AUG','ACU','ACC','ACA','ACG','AAU','AAC','AAA','AAG','AGU','AGC','AGA','AGG','GUU','GUC','GUA','GUG','GCU','GCC','GCA','GCG','GAU','GAC','GAA','GAG','GGU','GGC','GGA','GGG'], 'AA': ['F','F','L','L','S','S','S','S','Y','Y','_','_','C','C','_','W','L','L','L','L','P','P','P','P','H','H','Q','Q','R','R','R','R','I','I','M','M','T','T','T','T','N','N','K','K','S','S','R','R','V','V','V','V','A','A','A','A','D','D','E','E','G','G','G','G']}
AA= pandas.DataFrame(data=d)
for i in xAA:
codons = list(AA.mRNA.loc[AA['AA'] == i])
print codons
这是输出:
['AUA', 'AUG']
['GAU', 'GAC']
['UUU', 'UUC']
['UUU', 'UUC']
['GCU', 'GCC', 'GCA', 'GCG']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['GGU', 'GGC', 'GGA', 'GGG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['CCU', 'CCC', 'CCA', 'CCG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['GUU', 'GUC', 'GUA', 'GUG']
['ACU', 'ACC', 'ACA', 'ACG']
['GAA', 'GAG']
['GAA', 'GAG']
['ACU', 'ACC', 'ACA', 'ACG']
['CCU', 'CCC', 'CCA', 'CCG']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['GGU', 'GGC', 'GGA', 'GGG']
['GAA', 'GAG']
['GCU', 'GCC', 'GCA', 'GCG']
['GGU', 'GGC', 'GGA', 'GGG']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['GAA', 'GAG']
['GAA', 'GAG']
['GAU', 'GAC']
['GAU', 'GAC']
['GAA', 'GAG']
['GUU', 'GUC', 'GUA', 'GUG']
['GUU', 'GUC', 'GUA', 'GUG']
['GCU', 'GCC', 'GCA', 'GCG']
['AUA', 'AUG']
['AUU', 'AUC']
['AAA', 'AAG']
['GAA', 'GAG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['GAU', 'GAC']
['ACU', 'ACC', 'ACA', 'ACG']
['CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG']
['AUU', 'AUC']
['CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG']
['CCU', 'CCC', 'CCA', 'CCG']
['ACU', 'ACC', 'ACA', 'ACG']
['GUU', 'GUC', 'GUA', 'GUG']
['CAA', 'CAG']
['GAA', 'GAG']
['GAU', 'GAC']
['GGU', 'GGC', 'GGA', 'GGG']
['GGU', 'GGC', 'GGA', 'GGG']
['GAU', 'GAC']
['GUU', 'GUC', 'GUA', 'GUG']
['AUU', 'AUC']
['UAU', 'UAC']
['AAA', 'AAG']
['GGU', 'GGC', 'GGA', 'GGG']
['UUU', 'UUC']
['GAA', 'GAG']
['GAU', 'GAC']
['GGU', 'GGC', 'GGA', 'GGG']
['AUU', 'AUC']
['GUU', 'GUC', 'GUA', 'GUG']
['CAA', 'CAG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['AAA', 'AAG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['CAA', 'CAG']
['GGU', 'GGC', 'GGA', 'GGG']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['UGU', 'UGC']
['ACU', 'ACC', 'ACA', 'ACG']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['UGU', 'UGC']
['CCU', 'CCC', 'CCA', 'CCG']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['AUU', 'AUC']
['AUU', 'AUC']
['ACU', 'ACC', 'ACA', 'ACG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['AAA', 'AAG']
['AAU', 'AAC']
['GGU', 'GGC', 'GGA', 'GGG']
['AUU', 'AUC']
['CAA', 'CAG']
['AAU', 'AAC']
['AUA', 'AUG']
['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG']
['CAA', 'CAG']
['UUU', 'UUC']
['UAU', 'UAC']
['AUU', 'AUC']
['CCU', 'CCC', 'CCA', 'CCG']
['GAA', 'GAG']
['GUU', 'GUC', 'GUA', 'GUG']
['GAA', 'GAG']
['GGU', 'GGC', 'GGA', 'GGG']
['GUU', 'GUC', 'GUA', 'GUG']
['GAA', 'GAG']
['CAA', 'CAG']
['GUU', 'GUC', 'GUA', 'GUG']
['AUA', 'AUG']
['GAU', 'GAC']
['GAU', 'GAC']
['GAA', 'GAG']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['GAU', 'GAC']
['GAA', 'GAG']
['AAA', 'AAG']
['GAA', 'GAG']
['GCU', 'GCC', 'GCA', 'GCG']
['AAU', 'AAC']
['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC']
['CCU', 'CCC', 'CCA', 'CCG']
如果我加上这里显示的for c循环
codingseq = ""
for i in xAA:
codons = list(AA.mRNA.loc[AA['AA'] == i])
for c in codons:
xmRNA.find(c)
codingseq+= c
这就给了每个组合一个比较分析的方法来找出哪一个最像完整的mRNA序列?你知道吗
AUAAUG
AUAAUGGAUGAC
AUAAUGGAUGACUUUUUC
AUAAUGGAUGACUUUUUCUUUUUC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGAC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGAC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGAC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGAC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGACGGUGGCGGAGGG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGACGGUGGCGGAGGGGGUGGCGGAGGG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGACGGUGGCGGAGGGGGUGGCGGAGGGGAUGAC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGACGGUGGCGGAGGGGGUGGCGGAGGGGAUGACGUUGUCGUAGUG
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGACGGUGGCGGAGGGGGUGGCGGAGGGGAUGACGUUGUCGUAGUGAUUAUC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGACGGUGGCGGAGGGGGUGGCGGAGGGGAUGACGUUGUCGUAGUGAUUAUCUAUUAC
AUAAUGGAUGACUUUUUCUUUUUCGCUGCCGCAGCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGUUAUUGCUUCUCCUACUGCCUCCCCCACCGUUAUUGCUUCUCCUACUGGUUGUCGUAGUGACUACCACAACGGAAGAGGAAGAGACUACCACAACGCCUCCCCCACCGUCUUCCUCAUCGAGUAGCGGUGGCGGAGGGGAAGAGGCUGCCGCAGCGGGUGGCGGAGGGUCUUCCUCAUCGAGUAGCGAAGAGGAAGAGGAUGACGAUGACGAAGAGGUUGUCGUAGUGGUUGUCGUAGUGGCUGCCGCAGCGAUAAUGAUUAUCAAAAAGGAAGAGUUAUUGCUUCUCCUACUGUUAUUGCUUCUCCUACUGGAUGACACUACCACAACGCGUCGCCGACGGAGAAGGAUUAUCCGUCGCCGACGGAGAAGGCCUCCCCCACCGACUACCACAACGGUUGUCGUAGUGCAACAGGAAGAGGAUGACGGUGGCGGAGGGGGUGGCGGAGGGGAUGACGUUGUCGUAGUGAUUAUCUAUUACAAAAAG
请注意,并非所有显示的结果都超过了字符限制。 任何帮助都会很棒!你知道吗
这是一个相当紧凑的解决方案。创建反向词典时没有任何冲突,因此可以执行以下操作:
返回486,这是xAA密码子所在的位置。它可以通过对字符串进行边界检查来进一步充实,如果您只需要最接近的匹配,您可以对子字符串进行组合匹配(我仍然会进行反向映射,然后在该空间中搜索,虽然这样更快),但是是的。我也不知道不映射的丢失字符/三元组有多常见,因此可能需要对这些情况进行修改。你知道吗
这在xmRNA中发现了一个密码子序列,它与xAA匹配。注:在索引34处对d[“AA”]数据进行了修正(“M”替换为“I”),以匹配http://web.expasy.org/站点上使用的翻译。我没有使用
pandas
,只是简单的Python。我只是做了一个简单的测试,试图找到xAA在xmRNA中的位置(以及使用了什么密码子)。即使是很长的序列,它也应该足够快(即使100000个rna也应该接近瞬间)。你知道吗相关问题 更多 >
编程相关推荐