有重叠的单词比没有重叠的单词的分数高吗？

1条回答

网友

1楼 · 发布于 2024-10-03 02:44:00

Fuzzywuzzy是使用Levenshtein距离实现的。从wikipedia：

Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

编辑正如@dennis golomazov所指出的。token_sort_比率和token_set_比率之间存在重要的细节差异。在

token_sort_比率有四个步骤：

将字符串拆分为标记
排序标记
从https://github.com/ztane/python-Levenshtein对已排序的令牌调用Levenshtein ratio。在
返回比率*100

注意，这个算法不关心部分匹配

当这些步骤发生在字符串上时，代码本质上变成：

from Levenshtein import StringMatcher as sm

s1 = "chop loin moist tender pork"
s2 = "bicolor corn"

m = sm.StringMatcher(None, s1, s2)
print(int(m.ratio() * 100))

s1 = "corn cut store sweet tray yellow"
s2 = "bicolor corn"

m = sm.StringMatcher(None, s1, s2)
print(int(m.ratio() * 100))

您将注意到这些比率与您在测试用例中看到的比率相匹配。在

所以，你肯定想用fuzz.token_set_比率因为这说明了玉米在这两条线中都有，并且可以相应地匹配

相关问题更多 >

编程相关推荐

热门问题

热门文章

有重叠的单词比没有重叠的单词的分数高吗？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >