如何在Python中将匹配对聚合为“连接的组件”

def get_cliques(pairs): from sets import Set set_list = [Set(pairs[0])] for pair in pairs[1:]: matched=False for set in set_list: if pair[0] in set or pair[1] in set: set.update(pair) matched=True break if not matched: set_list.append(Set(pair)) return set_list pairs = [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('f', 'g')] print(get_cliques(pairs))

3条回答

网友

1楼 · 编辑于 2024-06-28 18:59:53

使用networkX：

import networkx as nx
G1=nx.Graph()
G1.add_edges_from([("a","b"),("b","c"),("c","d"),("d","e"),("f","g")])
sorted(nx.connected_components(G1), key = len, reverse=True)

给予：

^{pr2}$

你现在必须检查最快的算法。。。在

操作：

这太好了！我现在在我的PostgreSQL数据库中有这个。只需将对组织到一个两列表中，然后使用array_agg()传递给PL/Python函数get_connected()。谢谢。在

CREATE OR REPLACE FUNCTION get_connected(
    lhs text[],
    rhs text[])
  RETURNS SETOF text[] AS
$BODY$
    pairs = zip(lhs, rhs)

    import networkx as nx
    G=nx.Graph()
    G.add_edges_from(pairs)
    return sorted(nx.connected_components(G), key = len, reverse=True)

$BODY$ LANGUAGE plpythonu;

（注：我编辑了答案，因为我认为显示这一步可能是有用的附录，但对于评论来说太长了。）

网友

2楼 · 编辑于 2024-06-28 18:59:53

我不相信（如果我错了请纠正我）这与最大的集团问题没有直接关系。团的定义（维基百科）说一个团“在一个无向图中是它的顶点的子集，这样子集中的每两个顶点都由一条边连接”。在这种情况下，我们想找出哪些节点可以互相到达（甚至是间接的）。在

我做了一个小样本。它建立一个图并遍历它寻找邻居。这应该是非常有效的，因为每个节点只在组成组时遍历一次。在

from collections import defaultdict

def get_cliques(pairs):
    # Build a graph using the pairs
    nodes = defaultdict(lambda: [])
    for a, b in pairs:
        if b is not None:
            nodes[a].append((b, nodes[b]))
            nodes[b].append((a, nodes[a]))
        else:
            nodes[a]  # empty list

    # Add all neighbors to the same group    
    visited = set()
    def _build_group(key, group):
        if key in visited:
            return
        visited.add(key)
        group.add(key)
        for key, _ in nodes[key]:
            _build_group(key, group)

    groups = []
    for key in nodes.keys():
        if key in visited: continue
        groups.append(set())
        _build_group(key, groups[-1])

    return groups

if __name__ == '__main__':
    pairs = [
        ('a', 'b'), ('b', 'c'), ('b', 'd'), # a "tree"
        ('f', None),                        # no relations
        ('h', 'i'), ('i', 'j'), ('j', 'h')  # circular
    ]
    print get_cliques(pairs)
    # Output: [set(['a', 'c', 'b', 'd']), set(['f']), set(['i', 'h', 'j'])]

如果您的数据集最好是像一个图一样建模并且非常大，那么像Neo4j这样的图形数据库是合适的吗？在

网友

3楼 · 编辑于 2024-06-28 18:59:53

DSM的评论让我在Python中寻找集合合并算法。Rosetta Code有相同算法的两个版本。示例用法（非递归版本）：

[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('f', 'g')]

# Copied from Rosetta Code
def consolidate(sets):
    setlist = [s for s in sets if s]
    for i, s1 in enumerate(setlist):
        if s1:
            for s2 in setlist[i+1:]:
                intersection = s1.intersection(s2)
                if intersection:
                    s2.update(s1)
                    s1.clear()
                    s1 = s2
    return [s for s in setlist if s]

print consolidate([set(pair) for pair in pairs])
# Output: [set(['a', 'c', 'b', 'd']), set([None, 'f']), set(['i', 'h', 'j'])]

现实世界问题：

问题的概念版本：

Python 2代码示例：

python3代码示例：

相关问题更多 >

编程相关推荐

热门问题

热门文章