java相似性评分Levenshtein

1 周，3 日 Questions & Answers 27828

我用Java实现了Levenshtein算法，现在得到了该算法所做的修正，也就是成本。这确实有一点帮助，但没有太多，因为我想要的结果作为一个百分比

所以我想知道如何计算这些相似点

我也想知道你们是怎么做的，为什么

共 (6) 个答案

# 1 楼答案

LevenshteinDistance

它可以通过maven dependency使用

我确实认为使用此实现比编写自己的实现更好

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-text</artifactId>
    <version>1.3</version>
</dependency>

作为一个例子，看看下面的代码

import org.apache.commons.text.similarity.LevenshteinDistance;

public class MetricUtils {
    private static LevenshteinDistance lv = new LevenshteinDistance();

    public static void main(String[] args) {
        String s = "running";
        String s1 = "runninh";
        System.out.println(levensteinRatio(s, s1));
    }

    public static double levensteinRatio(String s, String s1) {
        return 1 - ((double) lv.apply(s, s1)) / Math.max(s.length(), s1.length());
    }
}

# 2 楼答案
The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. (Wikipedia)
- 所以Levenshtein距离为0意味着：两个字符串相等
- 最大Levenshtein距离（所有字符都不同）为最大值（string1.length，string2.length）
所以，如果你需要一个百分比，你必须用它来衡量点。例如：

“你好”，“你好”->；Levenstein距离1 这两个字符串的最大Levenstein距离为：5。因此，20%的字符不匹配
```
String s1 = "Hallo";
String s2 = "Hello";
int lfd = calculateLevensteinDistance(s1, s2);
double ratio = ((double) lfd) / (Math.max(s1.length, s2.length));
```
# 3 楼答案

您可以下载Apache Commons StringUtils并研究（或许可以使用）它们的Levenshtein距离算法的实现
# 4 楼答案
要计算分数，您需要最大可能成本（插入+删除+替换）。然后使用以下公式-
```
score = 1 - actual_cost/max_possible_cost
```
见此参考- Levenshtein Score Calculation Func

# 5 楼答案

 // Refer This: 100% working

public class demo 
{
public static void main(String[] args) 
{
    String str1, str2;

    str1="12345";
    str2="122345";


    int re=pecentageOfTextMatch(str1, str2);
    System.out.println("Matching Percent"+re);
}

public static int pecentageOfTextMatch(String s0, String s1) 
{                       // Trim and remove duplicate spaces
    int percentage = 0;
    s0 = s0.trim().replaceAll("\\s+", " ");
    s1 = s1.trim().replaceAll("\\s+", " ");
    percentage=(int) (100 - (float) LevenshteinDistance(s0, s1) * 100 / (float) (s0.length() + s1.length()));
    return percentage;
}

public static int LevenshteinDistance(String s0, String s1) {

    int len0 = s0.length() + 1;
    int len1 = s1.length() + 1;  
    // the array of distances
    int[] cost = new int[len0];
    int[] newcost = new int[len0];

    // initial cost of skipping prefix in String s0
    for (int i = 0; i < len0; i++)
        cost[i] = i;

    // dynamically computing the array of distances

    // transformation cost for each letter in s1
    for (int j = 1; j < len1; j++) {

        // initial cost of skipping prefix in String s1
        newcost[0] = j - 1;

        // transformation cost for each letter in s0
        for (int i = 1; i < len0; i++) {

            // matching current letters in both strings
            int match = (s0.charAt(i - 1) == s1.charAt(j - 1)) ? 0 : 1;

            // computing cost for each transformation
            int cost_replace = cost[i - 1] + match;
            int cost_insert = cost[i] + 1;
            int cost_delete = newcost[i - 1] + 1;

            // keep minimum cost
            newcost[i] = Math.min(Math.min(cost_insert, cost_delete),
                    cost_replace);
        }

        // swap cost/newcost arrays
        int[] swap = cost;
        cost = newcost;
        newcost = swap;
    }

    // the distance is the cost for transforming all letters in both strings
    return cost[len0 - 1];
}

}

# 6 楼答案

两个字符串之间Levenshtein差值的最大值将是两个字符串长度的最大值。（对应于每个字符的符号更改，最长为较短字符串的长度，加上插入或删除，具体取决于您是从短到长还是从短到长。）鉴于此，两个字符串的相似性必须是该最大值与该最大值与实际Levenshtein差值之间的比率

Levenshtein算法的实现往往不会记录这些编辑应该是什么，但考虑到Wikipedia page上的抽象算法，计算应该不会那么困难

Python中文网

有 Java 编程相关的问题?

java相似性评分Levenshtein

共 (6) 个答案

# 1 楼答案

# 2 楼答案

# 3 楼答案

# 4 楼答案

# 5 楼答案

# 6 楼答案