用Try/Except和Loops Python3进行DNA基序计数

import itertools def combination(k): bases=['A','T','G','C'] combo=[''.join(p) for p in itertools.product(bases, repeat=k)] return combo def hammingDistance(Pattern, seq): if Pattern == seq: return 0 else: dist=0 for i in range(len(seq)): if Pattern[i] != seq[i]: dist += 1 return dist def motif_enumeration(k, d, DNA): combos = combination(k) global pattern for combo in combos: try: inner_loop(k, d, DNA, combo) except: continue return set(pattern) def inner_loop(k, d, DNA, combo): global pattern for strings in DNA: inner_loop_two(k, d, DNA, combo, strings) def inner_loop_two(k, d, DNA, combo, strings): global pattern c=0 while c+k < len(DNA[0]): print(combo, strings[c:c+k], hammingDistance(combo, strings[c:c+k])) if d >= hammingDistance(combo, strings[c:c+k]) and strings == DNA[len(DNA)-1]: #if we've reached the last string and the condition is met, #that means that the combo is suitable for each string of DNA pattern += [combo] elif d >= hammingDistance(combo, strings[c:c+k]): #condition is met for one string, now move onto next break elif d < hammingDistance(combo, strings[c:c+k]) and c+k == len(DNA[0])-1: #Name error causes this inner loop two to crash, thus causing the first inner loop #to pass int(a) elif d < hammingDistance(combo, strings[c:c+k]): #change the window to see if the combo is valid later in the string c += 1 pattern = [] DNA=['ATTTGGC', 'TGCCTTA', 'CGGTATC', 'GAAAATT'] print(motif_enumeration(3,1,DNA)) print(pattern)

1条回答

网友

1楼 · 发布于 2024-07-05 14:09:10

这个逻辑的核心部分是，如果组合匹配目标字符串的任何位置，我们希望将一个combo收集到模式集中。我们可以使用Python的all和any函数来实现这一点。这些函数的工作效率很高，因为一旦结果确定，它们就停止测试。在

import itertools

def combination(k):
    return (''.join(p) for p in itertools.product('ATCG', repeat=k))

def hamming_distance(pattern, seq):
    return sum(c1 != c2 for c1, c2 in zip(pattern, seq))

def window(s, k):
    for i in range(1 + len(s) - k):
        yield s[i:i+k]

def motif_enumeration(k, d, DNA):
    pattern = set()
    for combo in combination(k):
        if all(any(hamming_distance(combo, pat) <= d 
                for pat in window(string, k)) for string in DNA):
            pattern.add(combo)
    return pattern

DNA = ['ATTTGGC', 'TGCCTTA', 'CGGTATC', 'GAAAATT']
print(motif_enumeration(3, 1, DNA))

输出

^{pr2}$

我对你的代码做了一些其他的修改。我们可以通过给sum函数传递一个生成器来有效地计算Hamming距离。我们可以通过使用生成器将组合元组转换为字符串而不是将它们放入列表中，从而节省时间和内存。在

motif_enumeration函数可以进一步浓缩为一个集合的理解，但我必须承认它相当密集，甚至比以前的版本更难阅读。不过，它的效率可能会稍微高一点。在

def motif_enumeration(k, d, DNA):
    return {combo for combo in combination(k)
        if all(any(hamming_distance(combo, pat) <= d 
            for pat in window(string, k)) for string in DNA)}

这里有一个更可读的版本，我给了motif_enumeration一个辅助函数in_window来执行内部测试。在

# Return True if combo is within d in any window of string
def in_window(combo, string, k, d):
    return any(hamming_distance(combo, pat) <= d for pat in window(string, k))

def motif_enumeration(k, d, DNA):
    pattern = set()
    for combo in combination(k):
        if all(in_window(combo, string, k, d) for string in DNA):
            pattern.add(combo)
    return pattern

相关问题更多 >

编程相关推荐

热门问题

热门文章