找到最匹配的序列

2024-06-03 01:31:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个序列文件。说ham1.txt:

AAACCCTTTGGG
AGGTACTTTTTT
TCTCTTTTTTTT

等等

ham2.txt文件:

^{pr2}$

我想将ham1.txt中的序列与ham2.txt中的序列进行匹配,这取决于哪个对具有最小的Hamming距离。我的python代码显示了它们之间的Hamming距离。我只想要一双最好的。这是我的密码

def hamming_distance(s1, s2):
    #Return the Hamming distance between equal-length sequences
    if len(s1) != len(s2):
        raise ValueError("Undefined for sequences of unequal length")
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

with open('ham1.txt','r') as file1:
                for s1 in file1:
                        with open('ham2.txt','r') as file2:
                                for s2 in file2:
                                        dist = hamming_distance(s1,s2)
                                        print s1,s2,dist

你能建议编辑吗。谢谢


Tags: 文件intxt距离for序列lengthdistance
3条回答

您应该看看itertools.product

In [7]:

L1 = ['AAACCCTTTGGG',
      'AGGTACTTTTTT',
      'TCTCTTTTTTTT']
L2 = ['AAACCCTTTGGG',
      'GAGAGGGAGGGC',
      'AGGTACTTTTTT',
      'CTCTTAATTTCC',
      'TCTCTTTTTTTT',
      'GTTTTTAAAAAA']
def hamming_distance(s1, s2):
    #Return the Hamming distance between equal-length sequences
    if len(s1) != len(s2):
        raise ValueError("Undefined for sequences of unequal length")
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
import itertools
res = [[hamming_distance(*item), item[0], item[1]] for item in itertools.product(L1, L2)]
sorted(res)[0]
Out[7]:
[0, 'AAACCCTTTGGG', 'AAACCCTTTGGG']

我会使用^{}

from functools import reduce


def hamming_distance(s1, s2):
    #Return the Hamming distance between equal-length sequences
    if len(s1) != len(s2):
        raise ValueError("Undefined for sequences of unequal length")
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

if __name__ == '__main__':
    with open('h1.txt') as f:
        f1 = f.read().splitlines()

    with open('h2.txt') as f:
        f2 = f.read().splitlines()

    for line in f1:
        print(line, reduce(lambda x, y: x if hamming_distance(line, y) > hamming_distance(line, x) else y, f2))

输出:

^{pr2}$

我已经生成了以下列表

0 AAACCCTTTGGG AAACCCTTTGGG
0 AGGTACTTTTTT AGGTACTTTTTT
0 TCTCTTTTTTTT TCTCTTTTTTTT
6 AGGTACTTTTTT TCTCTTTTTTTT
6 TCTCTTTTTTTT AGGTACTTTTTT
7 AAACCCTTTGGG AGGTACTTTTTT
7 AGGTACTTTTTT AAACCCTTTGGG
8 AAACCCTTTGGG TCTCTTTTTTTT
8 AGGTACTTTTTT CTCTTAATTTCC
8 TCTCTTTTTTTT AAACCCTTTGGG
8 TCTCTTTTTTTT CTCTTAATTTCC
9 AAACCCTTTGGG GAGAGGGAGGGC
9 TCTCTTTTTTTT GTTTTTAAAAAA
10 AAACCCTTTGGG CTCTTAATTTCC
11 AGGTACTTTTTT GAGAGGGAGGGC
11 AGGTACTTTTTT GTTTTTAAAAAA
12 AAACCCTTTGGG GTTTTTAAAAAA
12 TCTCTTTTTTTT GAGAGGGAGGGC

我想这就是你需要的,对吧?在

为了达到这个目的,我们使用了一些自由。 首先,我将数据流/字符串转换为值列表,然后将所有可能的 组合ham1ham2,并创建一个新的列表,该列表也包含hamming值, 然后我把它们分类。在

这对你有帮助吗?否则,请我帮你解决;)

使用的代码如下。在

^{pr2}$

相关问题 更多 >