汉明德斯坦逆

wordFromString = givenString[i:i+wordSize] mismatches = sum(ch1 != ch2 for ch1, ch2 in zip(wordFromString, generatedWord)) if mismatches <= d: #count that generated word in a list for future use #(only need the most repeated)

3条回答

网友
1楼 · 编辑于 2024-09-27 21:34:32

设给定的汉明距离为D，设w为“word”子串。从w，您可以通过深度限制depth-first search生成距离≤D的所有单词：
def dfs(w, D): seen = set([w]) # keep track of what we already generated stack = [(w, 0)] while stack: x, d = stack.pop() yield x if d == D: continue # generate all mutations at Hamming distance 1 for i in xrange(len(x)): for c in set("ACGT") - set(x[i]) y = x[:i] + c + x[i+1:] if y not in seen: seen.add(y) stack.append((y, d + 1))
（这绝不会很快，但可能会给人以启发。）

网友
2楼 · 编辑于 2024-09-27 21:34:32

如果我正确地理解了你的问题，你想确定基因组中得分最高的k-mers。k-mer的得分是它在基因组中出现的次数加上汉明距离小于m的任何k-mer在基因组中出现的次数。请注意，这假设您只对出现在您的基因组中的k-mer感兴趣（正如@j_random_hacker所指出的）。在
您可以通过四个基本步骤来解决此问题：
确定基因组中的所有k-mer。在
计算每个k-mer在G中出现的次数。在
对于每对k-mers（K1，K2），如果K1和{}的汉明距离小于m，则增加它们的计数。在
找到maxk-mer及其计数。在
下面是Python代码示例：
from itertools import combinations from collections import Counter # Hamming distance function def hamming_distance(s, t): if len(s) != len(t): raise ValueError("Hamming distance is undefined for sequences of different lengths.") return sum( s[i] != t[i] for i in range(len(s)) ) # Main function # - k is for k-mer # - m is max hamming distance def hamming_kmer(genome, k, m): # Enumerate all k-mers kmers = [ genome[i:i+k] for i in range(len(genome)-k + 1) ] # Compute initial counts initcount = Counter(kmers) kmer2count = dict(initcount) # Compare all pairs of k-mers for kmer1, kmer2 in combinations(set(kmers), 2): # Check if the hamming distance is small enough if hamming_distance(kmer1, kmer2) <= m: # Increase the count by the number of times the other # k-mer appeared originally kmer2count[kmer1] += initcount[kmer2] kmer2count[kmer2] += initcount[kmer1] return kmer2count # Count the occurrences of each mismatched k-mer genome = 'ACGTTGCATGTCGCATGATGCATGAGAGCT' kmer2count = hamming_kmer(genome, 4, 1) # Print the max k-mer and its count print max(kmer2count.items(), key=lambda (k,v ): v ) # Output => ('ATGC', 5)

网友
3楼 · 编辑于 2024-09-27 21:34:32

def mutations(word, hamming_distance, charset='ATCG'):
    for indices in itertools.combinations(range(len(word)), hamming_distance):
        for replacements in itertools.product(charset, repeat=hamming_distance):
            mutation = list(word)
            for index, replacement in zip(indices, replacements):
                mutation[index] = replacement
            yield "".join(mutation)

此函数生成汉明距离小于或等于给定数字的单词的所有突变。它是相对有效的，并且不检查无效的单词。然而，有效突变可以出现多次。如果希望每个元素都是唯一的，请使用集合。在

相关问题更多 >

编程相关推荐

热门问题

热门文章