如何加速大型np.add.外部关于大矩阵?

2024-05-20 00:05:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我最初发布了一个关于计算logsumexp的问题

How to efficiently compute logsumexp of upper triangle in a nested loop?

我接受的答案是

import Numpy as np

Wm  = np.array([[1,   2,   3],
                [4,   5,   6],
                [7,   8,   9],
                [10, 11,  12]])

wx  = np.array([1,   2,   3])
wy  = np.array([4,   5,   6])

Wxy = np.array([[5,   6,   7],
                [6,   7,   8],
                [7,   8,   9]])

'''
np.triu_indices = ([0, 0, 1], [1, 2, 2])
Wxy[triu_inds] = [6, 7, 8]
np.logsumexp(Wxy[triu_inds]) = log(exp(6) + exp(7) + exp(8))
'''

for x in range(n-1):
    wx = Wm[x, :]
    for y in range(x+1, n):
        wy = Wm[y, :]
        Wxy = np.add.outer(wx, wy)
        Wxy = Wxy[triu_inds]
        W[x, y] = np.logsumexp(Wxy)

# solution here
W = np.logsumexp(
    np.add.outer(Wm, Wm).swapaxes(1, 2)[(slice(None),)*2 + triu_inds],
    axis=-1  # Perform summation over last axis.
)
W = np.triu(W, k=1)

问题是,这是真正的大矩阵缓慢,因为问题爆炸很快。如果Wm的维数是m,n,那么所需的内存量将随着(m*n)**2 * 8字节的增长而增长。我需要在大于1000x200的矩阵上运行它,但是我得到内存错误,而且速度非常慢。你知道吗


Tags: inaddfornprangearraywxouter