矩阵乘法、fast和spars的子集

def sparse_mult_notreally(a, b, coords): rows, cols = coords rows, r_idx = np.unique(rows, return_inverse=True) cols, c_idx = np.unique(cols, return_inverse=True) C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is dense return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) )

def sparse_mult(a, b, coords): rows, cols = coords n = len(rows) C = np.array([ float(a[rows[i],:]*b[:,cols[i]]) for i in range(n) ]) # this is sparse, but VERY slow return sp.coo_matrix( (C,coords), (a.shape[0],b.shape[1]) )

3条回答

网友

1楼 · 编辑于 2024-09-27 21:26:00

根据评论中的额外信息，我认为让您感到困惑的是对np.unique的调用。尝试以下方法：

import numpy as np
import scipy.sparse as sps
from numpy.core.umath_tests import inner1d

n = 100000
x = np.random.rand(n, 10)
theta = np.random.rand(n, 10)
rows = np.arange(n)
cols = np.arange(n)
np.random.shuffle(rows)
np.random.shuffle(cols)


def sparse_multiply(x, theta, rows, cols):
    data = inner1d(x[rows], theta[cols])
    return sps.coo_matrix((data, (rows, cols)),
                          shape=(x.shape[0], theta.shape[0]))

我得到以下时间安排：

^{pr2}$

当然，对于n = 100：

>>> np.allclose(sparse_multiply(x, theta, rows, cols).toarray()[rows, cols],
                x.dot(theta.T)[rows, cols])
>>> True

网友

2楼 · 编辑于 2024-09-27 21:26:00

还没有测试詹姆的答案（再次感谢！），但我用cython实现了另一个同时有效的答案。在

文件sparse_mult_c.pyx：

# working from tutorial at: http://docs.cython.org/src/tutorial/numpy.html
cimport numpy as np

# Turn bounds checking back on if there are ANY problems!
cimport cython
@cython.boundscheck(False) # turn of bounds-checking for entire function
def sparse_mult_c(np.ndarray[np.float64_t, ndim=2] a,
                  np.ndarray[np.float64_t, ndim=2] b,
                  np.ndarray[np.int32_t, ndim=1] rows,
                  np.ndarray[np.int32_t, ndim=1] cols,
                  np.ndarray[np.float64_t, ndim=1] C ):

    cdef int n = rows.shape[0]
    cdef int k = a.shape[1]
    cdef int i,j

    for i in range(n):
        for j in range(k):
            C[i] += a[rows[i],j] * b[j,cols[i]]

然后按照http://docs.cython.org/src/userguide/tutorial.html编译它

然后在我的python代码中，我包括以下内容：

^{pr2}$

这是完全稀疏的，运行速度甚至比原来的（内存效率低）解决方案还要快。在

网友

3楼 · 编辑于 2024-09-27 21:26:00

我为您的问题分析了4种不同的解决方案，看起来对于任何大小的数组，numbajit解决方案都是最好的。紧随其后的是@Alexander的cython解决方案。在

以下是结果（M是x数组中的行数）：

M = 1000
function sparse_dense    took 0.03 sec.
function sparse_loop     took 0.07 sec.
function sparse_numba    took 0.00 sec.
function sparse_cython   took 0.09 sec.
M = 10000
function sparse_dense    took 2.88 sec.
function sparse_loop     took 0.68 sec.
function sparse_numba    took 0.00 sec.
function sparse_cython   took 0.01 sec.
M = 100000
function sparse_dense    ran out of memory
function sparse_loop     took 6.84 sec.
function sparse_numba    took 0.09 sec.
function sparse_cython   took 0.12 sec.

我用来分析这些方法的脚本是：

^{pr2}$

cython函数是@Alexander代码的一个稍微修改的版本：

# working from tutorial at: http://docs.cython.org/src/tutorial/numpy.html
cimport numpy as np

# Turn bounds checking back on if there are ANY problems!
cimport cython
@cython.boundscheck(False) # turn of bounds-checking for entire function
def sparse_mult_c(np.ndarray[np.float64_t, ndim=2] a,
                  np.ndarray[np.float64_t, ndim=2] b,
                  np.ndarray[np.float64_t, ndim=1] data,
                  np.ndarray[np.int32_t, ndim=1] rows,
                  np.ndarray[np.int32_t, ndim=1] cols,
                  np.ndarray[np.float64_t, ndim=1] out):

    cdef int n = rows.shape[0]
    cdef int k = a.shape[1]
    cdef int i,j

    cdef double num

    for i in range(n):
        num = 0.0
        for j in range(k):
            num += a[rows[i],j] * b[j,cols[i]]
        out[i] = data[i]*num

相关问题更多 >

编程相关推荐

热门问题

热门文章