按交叉点合并2组列表

2024-09-29 06:22:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个组列表,列表元素格式为[名称,组id]:

lst1 = [
    ['apple', 1],
    ['banana', 1],
    ['orange', 1],
    ['123', 2],
    ['456', 2],
    ['abc', 3],
    ['ABC', 3],
    ['tony', 4],
    ['john', 4],
    ['jack', 4],
]

lst2 = [
    ['!@#', 1],
    ['apple', 2],
    ['banana', 2],
    ['strawberry', 2],
    ['lemon', 2],    
    ['john', 3],    
    ['tony', 3],
    ['adella', 3],
]

我想通过名称的交集合并2个列表,这意味着如果它们有最多的共同值,则合并2个组(最后的组id不重要)。结果如下:

lst = [
    ['apple', 1],
    ['banana', 1],
    ['orange', 1],
    ['strawberry', 1],
    ['lemon', 1],       
    ['!@#', 2],
    ['john', 3],    
    ['tony', 3],
    ['adella', 3],   
    ['jack', 3],    
    ['123', 4],
    ['456', 4],
    ['abc', 5],
    ['ABC', 5],   
]

我怎样才能有效地做到这一点


Tags: 名称id元素apple列表johnlemonbanana
1条回答
网友
1楼 · 发布于 2024-09-29 06:22:37

这里有一个有效的解决方案。它不是最优的(O(n**2)),因为它需要将第一个列表的所有元素与第二个列表的所有元素进行比较。我希望有人能想出更好的算法,但与此同时:

from itertools import groupby

# group elements with common id and transform to set
def to_set(l):
    return [set(e[0] for e in g)
            for k,g in groupby(l, key=lambda x: x[1])]

# find first element of set_list that overlaps s1
def match_set(s1, set_list):
    for s2 in set_list:
        if len(s1.intersection(s2)) > 0:
            return s1.union(s2)
    return s1

sets1 = to_set(lst1)
sets2 = to_set(lst2)

# perform merge both ways (to have "outer join")
out = {tuple(sorted(match_set(s1, sets2))) for s1 in sets1}
out = out.union({tuple(sorted(match_set(s2, out))) for s2 in sets2})

# annotate with new group
out = [[v, i] for i,t in enumerate(out) for v in t]

输出:

[['apple', 0],
 ['banana', 0],
 ['lemon', 0],
 ['orange', 0],
 ['strawberry', 0],
 ['!@#', 1],
 ['123', 2],
 ['456', 2],
 ['ABC', 3],
 ['abc', 3],
 ['adella', 4],
 ['jack', 4],
 ['john', 4],
 ['tony', 4]]

相关问题 更多 >