真damerau-levenshtein算法的cython实现。

fastDamerauLevenshtein的Python项目详细描述


FastDamerauleVenshtein

Build StatusWheel Status

cython实现了真正的damerau levenshtein编辑距离,允许一个项目被多次编辑。 更多信息来自Wikipedia

In information theory and computer science, the Damerau-Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein) is a string metric for measuring the edit distance between two sequences. Informally, the Damerau-Levenshtein distance between two words is the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one word into the other.
The Damerau-Levenshtein distance differs from the classical Levenshtein distance by including transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions).

该实现基于James M. Jensen II解释,它允许指定每个操作的成本。

要求

这段代码需要Python2.7或3.4+和一个C编译器,比如GCC。

安装

fastdameraulevenshtein可在pypi上的https://pypi.python.org/pypi/fastDamerauLevenshtein找到。

使用pip

安装
pip install fastDamerauLevenshtein

从源安装:

python setup.py install

pip install .

用法

它被称为damerauLevenshtein的可用方法,可以计算两个可散列对象(字符串、字符串列表等)上的距离。该方法提供以下参数:

  • firstobject

  • secondobject

  • 相似性

    • 如果这个参数值是False,它将返回编辑的总成本,否则它将返回一个从0.0到1.0的分数,表示两个对象有多相似。默认为True
  • deleteWeight

    • 删除操作的成本。
  • insertweight

    • 插入操作的成本。
  • replaceWeight

    • 更换操作的成本。
  • swapweight

    • 交换操作的成本。

提供的操作权重必须是int值。默认情况下,所有这些值都是1

基本用途:

fromfastDamerauLevenshteinimportdamerauLevenshteindamerauLevenshtein('ca','abc',similarity=False)# expected result: 2.0damerauLevenshtein('car','cars',similarity=True)# expected result: 0.75damerauLevenshtein(['ab','bc'],['ab'],similarity=False)# expected result: 1.0damerauLevenshtein(['ab','bc'],['ab'],similarity=True)# expected result: 0.5

基准

其他python damerau levenshtein和osa实现:

Python 3.7(在Intel i5 6500上):

>>> import timeit
>>> #fastDamerauLevenshtein:
... timeit.timeit(setup="import fastDamerauLevenshtein; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="fastDamerauLevenshtein.damerauLevenshtein(text1, text2)", number=100000)
0.43
>>> #pyxDamerauLevenshtein:
... timeit.timeit(setup="from pyxdameraulevenshtein import normalized_damerau_levenshtein_distance; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="normalized_damerau_levenshtein_distance(text1, text2)", number=100000)
2.44
>>> #jellyfish
... timeit.timeit(setup="import jellyfish; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="jellyfish.damerau_levenshtein_distance(text1, text2)", number=100000)
0.20
>>> #editdistance
... timeit.timeit(setup="import editdistance; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="editdistance.eval(text1, text2)", number=100000)
0.22
>>> #textdistance
... timeit.timeit(setup="import textdistance; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="textdistance.damerau_levenshtein.distance(text1, text2)", number=100000)
0.70

许可证

它是根据麻省理工学院的许可证发行的。

Copyright (c) 2019 Robert Grigoroiu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java在jframe中模糊的背景上创建一个透明的矩形   java和super之间有区别。getX()和简单的x?   使用ant的java删除eclipse项目   java找不到。txt文件?   多线程Java wait()notify()   带按钮的java透明控件   java Android Studio 3无法构建我的项目   性能最佳(最快且节省内存)的Java收集/数据结构,可同时插入和删除项   spring+hibernate集成中的java ClassNotFoundException   java如何在Android上建立异步URL连接?   java当我选择contacts选项卡时,选项卡栏消失了   java根据数据库中的日期获取结果,不考虑时间   java如何访问WebChromeClient中的方法?   java如何在安卓中使用两行列表项?   spring Swagger Java日期格式验证引发异常