如何根据编辑距离对字符串列表进行有效排序？

def suggest(dic, word, distance, maxSugestions=5): list = [] for i in range(1, 200): for word1 in sorted(dic): if distance(word1, word) == i: list.append(word1) if len(list) == maxSugestions: return list

def levDistance(str1, str2): matrix = [[0 for x in range(len(str2) + 1)] for x in range(len(str1) + 1)] for i in range(len(str1) + 1): for j in range(len(str2) + 1): if i == 0: matrix[i][j] = j elif j == 0: matrix[i][j] = i elif str1[i-1] == str2[j-1]: matrix[i][j] = matrix[i-1][j-1] else: matrix[i][j] = 1 + min(matrix[i][j-1], matrix[i-1][j], matrix[i-1][j-1]) return matrix[len(str1)][len(str2)]

2条回答

网友

1楼 · 编辑于 2024-09-30 18:19:11

我试过这个方法，希望对你有用

def edit_distance(word, string_to_take_distance_with = "someString"):
    '''
    Description:
        give you the edit distance between 2 words
        word                            : String 1 (dynamic) 
        string_to_take_distance_with    : String 2 (static)
        
    '''
    
    
    length_of_string  = len(word)+1
    length_of_string2 = len(string_to_take_distance_with)+1

    tbl = {}
    for i in range(length_of_string): tbl[i,0]=i
    for j in range(length_of_string2): tbl[0,j]=j
    for i in range(1, length_of_string):
        for j in range(1, length_of_string2):
            cost = 0 if word[i-1] == string_to_take_distance_with[j-1] else 1
            tbl[i,j] = min(tbl[i, j-1]+1, tbl[i-1, j]+1, tbl[i-1, j-1]+cost)

    return tbl[i,j]




sorted(["hello","helo","aen"], key=edit_distance)

网友

2楼 · 编辑于 2024-09-30 18:19:11

您在每次迭代中计算相同的距离，这是一个很大的问题。请尝试只计算一次，然后获得maxSuggestion定义的建议数：

def suggest(dic, word, distance, maxSugestions=5):
   return [i[1] for i in sorted([(distance(word1, word), word1) for word1 in dic])[:maxSuggestion]]

然后是你的实现！如果您仍然希望更快，最好使用editdistance库。（或任何其他基于C的实现，如果有必要的话）而不是基于python的实现For me it went 20x faster than python implementation.根据原始答案：

I used a c-based implementation of calculating levenshtein distance making use of : editdistance library. On research I found that many such tasks have C-based implementations like matrix-multiplication and search algorithms etc. are readily available. Besides you can always write a module in C and make use of it in python. editdistance.eval('banana', 'bahama') took only 1.71 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) in comparison with my defined function levenshteinDistance('banana', 'bahama') which took 34.4 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) That’s a 20x speedup.

顺便说一句，我不是该软件包的作者，而是通过谷歌搜索“基于C的levenshtein距离实现”找到该软件包的

相关问题更多 >

编程相关推荐

热门问题

热门文章