擅长:python、mysql、java
<p>与其将<code>sparse</code>矩阵转换为<code>dense</code>(这是不可取的),我将使用scikits learn的<a href="http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html" rel="noreferrer">^{<cd3>}</a>,这是一种类似PCA的调光性减少算法(默认情况下使用随机SVD),用于稀疏数据:</p>
<pre><code>svd = TruncatedSVD(n_components=5, random_state=42)
data = svd.fit_transform(data)
</code></pre>
<p>引用<code>TruncatedSVD</code>文档:</p>
<blockquote>
<p>In particular, truncated SVD works on term count/tf-idf matrices as returned by the vectorizers in sklearn.feature_extraction.text. In that context, it is known as latent semantic analysis (LSA).</p>
</blockquote>
<p>这正是你的用例。</p>