Python的图形工具多处理计算ego网

2024-09-24 22:26:21 发布

男 | 程序猿一只，喜欢编程写python代码。

我正试图为一个相对较大的网络（8M个节点和17M条边）中的250k个特殊节点计算ego网络。由于每个特殊节点的切割过程需要3秒钟，因此我决定使用multiprocessing：

from graph_tool import Graph, GraphView
from graph_tool.topology import shortest_distance
from graph_tool import load_graph
import multiprocessing
import time

NO_PROC = 4
DEGREE = 4 # neighbours of n-th degree
NO_SPECIAL_NODES = 250000

graph = load_graph('./graph.graphml') #8M nodes, 17M edges

def ego_net(g, ego, n):
    print("Graph's identity: {}.".format(id(g))) # check if you use the same object as in other calls
    d = shortest_distance(g=g, source=ego, max_dist=n) #O(V+E)
    u = GraphView(g, vfilt=d.a < g.num_vertices()) #O(V)
    u = Graph(u, prune=True)
    return (ego, u)

if __name__ == "__main__":
    # generate arguments
    data = [(graph, node, DEGREE) for node in range(0, NO_SPECIAL_NODES)]

    # choose forking strategy explicitly
    ctx = multiprocessing.get_context('fork')
    pool = ctx.Pool(NO_PROC)

    results = pool.starmap(ego_net, [piece for piece in data])

这种方法的问题是，即使我显式地选择了fork{a1}，子流程也不会使用graph对象，而是将大对象复制到每个子流程中。这种方法导致MemoryError，因为我无法为graph的副本提供足够的RAM

我知道multiprocessing中定义的data structures支持进程之间的共享。它们似乎不支持Graph等graph是其实例的复杂对象。有没有办法只加载一次graph并让所有进程都使用它？我确信ego_net函数从graph读取，并且不会以任何方式修改对象

Tags：对象 no in from import 网络 data net

0条回答

目前没有回答

Python的图形工具多处理计算ego网

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python的图形工具多处理计算ego网

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >