组合具有重叠元素的列表

2024-10-02 12:28:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个列表的集合,其中一些有重叠的元素:

coll = [['aaaa', 'aaab', 'abaa'],
        ['bbbb', 'bbbb'], 
        ['aaaa', 'bbbb'], 
        ['dddd', 'dddd'],
        ['bbbb', 'bbbb', 'cccc','aaaa'],
        ['eeee','eeef','gggg','gggi'],
        ['gggg','hhhh','iiii']]

我只想把重叠的列表集中在一起,这样会产生

pooled = [['aaaa', 'aaab', 'abaa','bbbb','cccc'], 
          ['eeee','eeef','gggg','gggi','hhhh','iiii'],
          ['dddd', 'dddd']]

(如果不清楚,第一个和第二个列表都与第三个列表重叠,因此应该合并在一起,即使它们本身并不包含共同的元素。)

“重叠”是指两个列表至少有一个共同元素“合并”是指将两个列表合并为一个单一平面列表或一个单一平面集。你知道吗

可能有几个集合,例如x、y和z彼此重叠,v和w彼此重叠,但是x+y+z不与v+w重叠。可能有一些列表不与任何内容重叠。你知道吗

(一个类比是家庭。把所有的蒙太古人连在一起,把所有的卡普莱特人连在一起,但是没有一个蒙太古人娶过卡普莱特人,所以这两个群体将保持不同。)

我不在乎重复的项目是否包括多次或没有。你知道吗

在Python中,什么是一种简单且相当快速的方法?你知道吗

Edit:这似乎不是Yet another merging list of lists, but most pythonic way的副本,因为这似乎不考虑只在第三个集合中重叠的组。我从那个问题中尝试的解决方案并没有给出我在这里寻找的答案。你知道吗


Tags: 元素列表平面ccccaaaaddddeeeehhhh
3条回答

下面是一种方法(假设您希望在重叠的结果上有唯一的元素):

def over(coll):
     print('Input is:\n', coll)
     # gather the lists that do overlap 
     overlapping = [x for x in coll if any(x_element in [y for k in coll if k != x for y in k] for x_element in x)] 
     # flatten and get unique 
     overlapping = sorted(list(set([z for x in overlapping for z in x]))) 
     # get the rest
     non_overlapping = [x for x in coll if all(y not in overlapping for y in x)] 
     # use the line bellow only if merged non-overlapping elements are desired
     # non_overlapping = sorted([y for x in non_overlapping for y in x]) 
     print('Output is"\n',[overlapping, non_overlapping])

coll = [['aaaa', 'aaab', 'abaa'],
        ['bbbb', 'bbbb'], 
        ['aaaa', 'bbbb'], 
        ['dddd', 'dddd'],
        ['bbbb', 'bbbb', 'cccc','aaaa']]
over(coll)
coll = [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
over(coll)

输出:

$ python3 over.py                                                                                                                                                                NORMAL  
Input is:
 [['aaaa', 'aaab', 'abaa'], ['bbbb', 'bbbb'], ['aaaa', 'bbbb'], ['dddd', 'dddd'], ['bbbb', 'bbbb', 'cccc', 'aaaa']]
Output is"
 [['aaaa', 'aaab', 'abaa', 'bbbb', 'cccc'], [['dddd', 'dddd']]]
Input is:
 [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
Output is"
 [[], [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]]


可以使用连续合并方法对集合执行此操作:

coll = [['aaaa', 'aaab', 'abaa'],
        ['bbbb', 'bbbb'], 
        ['aaaa', 'bbbb'], 
        ['dddd', 'dddd'],
        ['bbbb', 'bbbb', 'cccc','aaaa'],
        ['eeee','eeef','gggg','gggi'],
        ['gggg','hhhh','iiii']]

pooled = [set(subList) for subList in coll]
merging = True
while merging:
    merging=False
    for i,group in enumerate(pooled):
        merged = next((g for g in pooled[i+1:] if g.intersection(group)),None)
        if not merged: continue
        group.update(merged)
        pooled.remove(merged)
        merging = True

print(pooled)
# [{'aaaa', 'abaa', 'aaab', 'cccc', 'bbbb'}, {'dddd'}, {'gggg', 'eeef', 'eeee', 'hhhh', 'gggi', 'iiii'}]

在评论中,我使用networkx处理来自alkasm的建议:

import networkx as nx

coll = [['aaaa', 'aaab', 'abaa'],
        ['bbbb', 'bbbb'], 
        ['aaaa', 'bbbb'], 
        ['dddd', 'dddd'],
        ['bbbb', 'bbbb', 'cccc','aaaa'],
        ['eeee','eeef','gggg','gggi'],
        ['gggg','hhhh','iiii']]

edges = []
for i in range(len(coll)):
    a = coll[i]
    for j in range(len(coll)):
        if i != j:
            b = coll[j]
            if set(a).intersection(set(b)):
                edges.append((i,j))

G = nx.Graph()
G.add_nodes_from(range(len(coll)))
G.add_edges_from(edges)

for c in nx.connected_components(G):
    combined_lists = [coll[i] for i in c]
    flat_list = [item for sublist in combined_lists for item in sublist]
    print(set(flat_list))

输出:

{'cccc', 'bbbb', 'aaab', 'aaaa', 'abaa'}
{'dddd'}
{'eeef', 'eeee', 'hhhh', 'gggg', 'gggi', 'iiii'}

毫无疑问,这可以优化,但它似乎解决了我的问题,现在。你知道吗

相关问题 更多 >

    热门问题