如何计算百万节点上的个性化PageRank？

# x_0 is a column vector of all zeros, except a 1 in the position corresponding to node n # adjacency_matrix is a matrix with a 1 in position (i, j) if there is an edge from node i to node j x_1 = 0.5 * x_0 + 0.5 * adjacency_matrix * x_0 x_2 = 0.5 * x_0 + 0.5 * adjacency_matrix * x_1 x_3 = 0.5 * x_0 + 0.5 * adjacency_matrix * x_2 # x_3 now holds the personalized PageRank scores # i'm basically approximating the personalized PageRank by running this for only 3 iterations

2条回答

网友

1楼 · 编辑于 2024-09-27 07:35:36

我本以为“PageRank”算法最好被看作是有向图http://en.wikipedia.org/wiki/Directed_graph（可能有适当的权重）。在

我喜欢位于http://networkx.lanl.org的networkx库

你会发现它还有一个“PageRank”的例子，在算法下你可以适应。在

网友

2楼 · 编辑于 2024-09-27 07:35:36

在您的例子中，如果数据存储方式正确，使用模拟随机行走迭代方法应该可以很好地工作。当与节点数相比只有很少的边时（就像你的例子），我不认为矩阵方法是一个好的选择，因为它是一个非常稀疏的矩阵，但实际上这种方法意味着你要检查从I到j的任何I和j节点的存在性（顺便说一下，我不确定这些乘法运算的运行时间零分，真的需要。）

如果您的数据存储方式是，对于每个节点对象，您都有其传出链接的目的地列表，则随机漫游模拟方法将相当快速。忽略阻尼因子，这就是您在随机行走模拟的每次迭代中实际要做的事情：

for node in nodes:
    for destination in node.destinations:
        destination.pageRank += node.pageRank/len(destinations)

每次迭代的时间复杂度为O（n*k），其中n=1m，k=10。这听起来不错，如果我没有遗漏什么。在

相关问题更多 >

编程相关推荐

热门问题

热门文章