<p>我正在调用下面的jensen_shannon(查询,矩阵)函数,以在文档矩阵中查找文档查询中最相似的文档</p>
<pre><code>def jensen_shannon(query, matrix):
"""
This function implements a Jensen-Shannon similarity
between the input query (an LDA topic distribution for a document)
and the entire corpus of topic distributions.
It returns an array of length M where M is the number of documents in the corpus
"""
# lets keep with the p,q notation above
p = query[None,:].T # take transpose
q = matrix.T # transpose matrix
m = 0.5*(p + q)
return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))
</code></pre>
<p>查询形状:(100,)</p>
<p>矩阵的形状:(10804100)</p>
<p>错误回溯:</p>
<pre><code>ValueError Traceback (most recent call last)
<ipython-input-103-86cb68dd862d> in <module>
1 # this is surprisingly fast
----> 2 most_sim_ids = get_most_similar_documents(new_doc_distribution,doc_topic_dist)
<ipython-input-102-c0fb95224e87> in get_most_similar_documents(query, matrix, k)
6 print(query.shape)
7 print(matrix.shape)
----> 8 sims = jensen_shannon(query,matrix) # list of jensen shannon distances
9 return sims.argsort()[:k] # the top k positional index of the smallest Jensen Shannon distances
<ipython-input-74-6ffb0ec54e9a> in jensen_shannon(query, matrix)
10 q = matrix.T # transpose matrix
11 m = 0.5*(p + q)
---> 12 return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))
~/venv/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py in entropy(pk, qk, base, axis)
2668 qk = asarray(qk)
2669 if qk.shape != pk.shape:
-> 2670 raise ValueError("qk and pk must have same shape.")
2671 qk = 1.0*qk / np.sum(qk, axis=axis, keepdims=True)
2672 vec = rel_entr(pk, qk)
ValueError: qk and pk must have same shape.
</code></pre>
<p><a href="https://github.com/scipy/scipy/issues/11325" rel="nofollow noreferrer">Add axis parameter for scipy.spatial.distance.jensenshannon</a>但它不接受函数中的轴参数</p>
<p>有人知道我错过了什么吗?非常感谢任何潜在客户。谢谢</p>
<p>仅供参考:我正在尝试这个kaggle代码
<a href="https://www.kaggle.com/ktattan/lda-and-document-similarity/data" rel="nofollow noreferrer">https://www.kaggle.com/ktattan/lda-and-document-similarity/data</a></p>