<h2>如果您想在没有任何内存开销的情况下(就地)执行此操作</h2>
<p>始终考虑数据的实际存储方式。
一个关于<a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html" rel="nofollow noreferrer">csc matrix</a>的小例子</p>
<pre><code>shape=(5,5)
X=sparse.random(shape[0], shape[1], density=0.5, format='csc')
print(X.todense())
[[0.12146814 0. 0. 0.04075121 0.28749552]
[0. 0.92208639 0. 0.44279661 0. ]
[0.63509196 0.42334964 0. 0. 0.99160443]
[0. 0. 0.25941113 0.44669367 0.00389409]
[0. 0. 0. 0. 0.83226886]]
i=0 #first column
print(X.data[X.indptr[i]:X.indptr[i+1]])
[0.12146814 0.63509196]
</code></pre>
<p><strong>一个简单的解决方案</strong></p>
<p>因此,我们在这里要做的唯一一件事就是在适当的位置逐列修改非零条目。这可以使用部分矢量化的numpy解决方案轻松完成<code>data</code>就是包含所有非零值的数组,<code>indptr</code>存储每列开始和结束的信息</p>
<pre><code>def Numpy_csc_norm(data,indptr):
for i in range(indptr.shape[0]-1):
xs = np.sum(data[indptr[i]:indptr[i+1]])
#Modify the view in place
data[indptr[i]:indptr[i+1]]/=xs
</code></pre>
<p>就性能而言,此就地解决方案已经不太糟糕了。如果您想进一步提高性能,可以使用Cython/Numba/或其他一些编译代码,这些代码可以或多或少地用Python打包</p>
<p><strong>麻木的解决方案</strong></p>
<pre><code>@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def Numba_csc_norm(data,indptr):
for i in nb.prange(indptr.shape[0]-1):
acc=0
for j in range(indptr[i],indptr[i+1]):
acc+=data[j]
for j in range(indptr[i],indptr[i+1]):
data[j]/=acc
</code></pre>
<p><strong>性能</strong></p>
<pre><code>#Create a not to small example matrix
shape=(50_000,10_000)
X=sparse.random(shape[0], shape[1], density=0.001, format='csc')
#Not in-place from hpaulj
def hpaulj(X):
acc=X.sum(axis=0)
return X.multiply(sparse.csr_matrix(1./acc))
%timeit X2=hpaulj(X)
#6.54 ms ± 67.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#All 2 variants are in-place,
#but this shouldn't have a influence on the timings
%timeit Numpy_csc_norm(X.data,X.indptr)
#79.2 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#parallel=False -> faster on tiny matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#626 µs ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
#parallel=True -> faster on larger matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#185 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
</code></pre>