Python：如何更正拼写错误的名称

网友

1楼 · 编辑于 2024-10-06 07:00:22

首先，您应该使用字符串之间的Levenshtein距离，我发现了一个带有以下Levenshtein Distance Algorithm for Python的链接：

# Define Levenshtein distance function (from the mentioned link)
def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)

    if len(s2) == 0:
        return len(s1)

    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1 
            deletions = current_row[j] + 1  
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row

    return previous_row[-1]

一旦你得到了这个，你应该让一个函数能够在给定的字符串和拼写良好的名称列表之间找到最接近的匹配。在

^{pr2}$

最后，你只需要用这个函数循环第一个列表。结果如下：

>>> print final_list
['Barcelona', 'Amsterdam', 'Prague']

网友

2楼 · 编辑于 2024-10-06 07:00:22

您可以使用内置的Ratcliff和Obershelp算法：

def is_similar(first, second, ratio):
    return difflib.SequenceMatcher(None, first, second).ratio() > ratio


first = ['bercelona', 'emstrdam', 'Praga']
second = ['New York', 'Amsterdam', 'Barcelona', 'Berlin', 'Prague']

result = [s for f in first for s in second if is_similar(f,s, 0.7)]
print result
['Barcelona', 'Amsterdam', 'Prague']

其中0.7为相似系数。它可能会对您的案例进行一些测试并设置此值。它显示了两个字符串有多相似（1-是同一个字符串，0-非常不同的字符串）

网友

3楼 · 编辑于 2024-10-06 07:00:22

这可能是一个名为fuzzywuzzy的优秀包的一个很好的用例。在

from fuzzywuzzy import fuzz
import numpy as np

bad = ['bercelona', 'emstrdam', 'Praga']

good = ['New York', 'Amsterdam', 'Barcelona', 'Berlin', 'Prague']

# you can even set custom threshold and only return matches if above certain
# matching threshold
def correctspell(word, spellcorrect, thresh = 70):
    mtchs = map(lambda x: fuzz.ratio(x, word) if fuzz.ratio(x, word) > thresh else None, spellcorrect)
    max = np.max(mtchs)
    if max is not None:
        return spellcorrect[mtchs.index(max)]
    else:
        return None

# get correct spelling
map(lambda x: correctspell(x, good, thresh = 70), bad) # ['Barcelona', 'Amsterdam', 'Prague']

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：如何更正拼写错误的名称

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >