通过不同的距离度量，用同一个字母找到最接近的拼写 - 问答 - Python中文网

通过不同的距离度量，用同一个字母找到最接近的拼写

2024-09-26 22:09:50 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试编写一个函数，通过不同的n-gram和距离度量，找到单词（可能拼写错误）的最接近拼写。在

对于我现在所拥有的

from nltk.corpus import words from nltk import ngrams from nltk.metrics.distance import edit_distance, jaccard_distance first_letters = ['A','B','C'] spellings = words.words() def recommendation(word): n = 3 # n means 'n'-grams, here I use 3 as an example spellings_new = [w for w in spellings if (w[0] in first_letters)] dists = [________(set(ngrams(word, n)), set(ngrams(w, n))) for w in spellings_new] # ______ is the distance measure return spellings_new[dists.index(min(dists))]

其余的看起来很简单，但我不知道如何指定“相同的首字母”条件。特别是，如果拼写错误的单词以字母“A”开头，则建议从“.words”中更正的单词与拼写错误单词的最小距离度量也应以“A”开头。以此类推。从上面的功能块中可以看到，我使用“（w[0]in first_-letters）”作为我的“初始字母条件”，但这并不能做到这一点，而且总是返回以不同首字母开头的字母。我还没有找到类似的线程在这个板上解决我的问题，如果有人能启发我如何指定'初始字母条件'，将不胜感激。如果这个问题以前被问过而且被认为是不恰当的，我会删除它。在

谢谢。在

Tags： in from import new 字母单词 distance first

1条回答

网友

1楼 · 发布于 2024-09-26 22:09:50

你真的很接近。w[0] == word[0]可用于检查第一个字母是否相同。之后，set(w)和{}可以用来将单词转换成字母集。然后我把它传给了jaccard_distance，只因为那是你已经进口的。有可能有更好的解决办法。在

def recommendation(word):
    n = 3
    # n means 'n'-grams, here I use 3 as an example
    spellings_new = [w for w in spellings if (w[0] == word[0])]
    dists = [jaccard_distance(set(w), set(word)) for w in spellings_new]
    return spellings_new[dists.index(min(dists))]

相关问题更多 >

编程相关推荐

热门问题

热门文章