按交叉点合并2组列表

lst1 = [ ['apple', 1], ['banana', 1], ['orange', 1], ['123', 2], ['456', 2], ['abc', 3], ['ABC', 3], ['tony', 4], ['john', 4], ['jack', 4], ] lst2 = [ ['!@#', 1], ['apple', 2], ['banana', 2], ['strawberry', 2], ['lemon', 2], ['john', 3], ['tony', 3], ['adella', 3], ]

lst = [ ['apple', 1], ['banana', 1], ['orange', 1], ['strawberry', 1], ['lemon', 1], ['!@#', 2], ['john', 3], ['tony', 3], ['adella', 3], ['jack', 3], ['123', 4], ['456', 4], ['abc', 5], ['ABC', 5], ]

1条回答

网友

1楼 · 发布于 2024-09-29 06:22:37

这里有一个有效的解决方案。它不是最优的（O（n**2）），因为它需要将第一个列表的所有元素与第二个列表的所有元素进行比较。我希望有人能想出更好的算法，但与此同时：

from itertools import groupby

# group elements with common id and transform to set
def to_set(l):
    return [set(e[0] for e in g)
            for k,g in groupby(l, key=lambda x: x[1])]

# find first element of set_list that overlaps s1
def match_set(s1, set_list):
    for s2 in set_list:
        if len(s1.intersection(s2)) > 0:
            return s1.union(s2)
    return s1

sets1 = to_set(lst1)
sets2 = to_set(lst2)

# perform merge both ways (to have "outer join")
out = {tuple(sorted(match_set(s1, sets2))) for s1 in sets1}
out = out.union({tuple(sorted(match_set(s2, out))) for s2 in sets2})

# annotate with new group
out = [[v, i] for i,t in enumerate(out) for v in t]

输出：

[['apple', 0],
 ['banana', 0],
 ['lemon', 0],
 ['orange', 0],
 ['strawberry', 0],
 ['!@#', 1],
 ['123', 2],
 ['456', 2],
 ['ABC', 3],
 ['abc', 3],
 ['adella', 4],
 ['jack', 4],
 ['john', 4],
 ['tony', 4]]

相关问题更多 >

编程相关推荐

热门问题

热门文章