按其他列表中的最佳匹配对列表进行排序

2024-09-28 17:18:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图将一个列表按另一个列表排序,但它们并非100%相同

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]

Out: list1_ordered_by_list2 = ["2banana", "mango", "1 apple"]

我很高兴使用jellyish.levenshtein_distance()进行比较,但是我不确定如何将列表1中的每个元素与列表2中的每个元素进行比较,并返回按列表2顺序排序的列表1

值得一提的是,我的两个列表长度相同。然而,一个更通用的解决方案将是非常有价值的

如果两个列表的ITME数量不同,我可以得到它们之间的映射,这是一个额外的点。e、 g

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2", "apple"]

Out: list1_ordered_by_list2 = ["1 apple", "2banana", "mango"]

这可能相当复杂。如果需要进一步澄清,请告诉我。 我希望你能帮忙。 谢谢


Tags: 元素apple列表by排序顺序outlevenshtein
3条回答

下面是一种评论

请注意,下面的代码显示了实际的ld值。我们可以看到

(芒果)<-&燃气轮机;(apple2)(芒果)的ld“更好”<-&燃气轮机;(0.5芒果1-)

输出的最后一行显示排序列表中元素的索引

from jellyfish import levenshtein_distance as ld

list1 = ["1 apple", "2banana", "mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]
list3 = []
for x in list1:
    offset = 0
    for idx, y in enumerate(list2):
        ld_value = ld(x, y)
        print('({}) <-> ({})  > {}'.format(x,y,ld_value))
        if idx == 0:
            _min = ld_value
            continue
        else:
            if ld_value < _min:
                _min = ld_value
                offset = idx
    list3.append((x, offset))
    print()
print(list3)

输出

(1 apple) <-> (3bana2na 2+)  > 10
(1 apple) <-> (0.5 mango 1-)  > 10
(1 apple) <-> (apple2)  > 3

(2banana) <-> (3bana2na 2+)  > 5
(2banana) <-> (0.5 mango 1-)  > 10
(2banana) <-> (apple2)  > 7

(mango) <-> (3bana2na 2+)  > 9
(mango) <-> (0.5 mango 1-)  > 7
(mango) <-> (apple2)  > 6

[('1 apple', 2), ('2banana', 0), ('mango', 2)]

使用Lior的rank函数,可以使用difflib实现示例输出:

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2", "apple"]

import difflib

def rank(x):
    dist = [len(list(difflib.ndiff(x, s))) for s in list2]
    return dist.index(min(dist))

>>> sorted(list1, key=rank)
['1 apple', '2banana', 'mango']

或者用你的第一个例子:

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]

>>> sorted(list1, key=rank)
['2banana', '1 apple', 'mango']

对参考列表使用一种形式的模糊匹配可能会更快。您可以使用difflib中的regex moduleget_close_matches

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]

import difflib

def rank2(s, ref=list2):
    try:
        w=difflib.get_close_matches(s, ref)
        return ref.index(w[0])
    except IndexError:
        return len(ref)+1

>>> sorted(list1, key=rank2)
['2banana', '1 apple', 'mango']

您需要基于jellyfish.levenshtein_distance()创建一个排名函数,该函数返回最小距离的索引并将其交给排序

from jellyfish import levenshtein_distance as ld

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple23"]

def rank(x):
    dist = [ld(x, s) for s in list2]
    return dist.index(min(dist))

print(sorted(list1, key=rank))  #  > ['2banana', '1 apple', 'mango']

相关问题 更多 >