在sklearn近邻搜索中余弦距离非常大

2024-10-04 05:27:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我用稀疏矩阵格式运行KNN最近邻搜索。在

nlf = neighbors.NearestNeighbors(n_neighbors=20,algorithm='brute', metric='cosine')

df_csr 

Out[]: <100x2253274 sparse matrix of type '<type 'numpy.int64'>'
with 8105964 stored elements in Compressed Sparse Row format>

trainY = xrange(100)
nlf.fit(df_csr, trainY)

sf_csr

Out[]: <1x2253274 sparse matrix of type '<type 'numpy.int64'>'
with 7172 stored elements in Compressed Sparse Row format>

result1 = nlf.kneighbors(sf_csr)

result1[1]+1, result1[0]

Out[230]:
(array([[ 63,  10,  78,  19,  40,  14,  23,  53,  11,  66,  29,  77,  69,
      83,  76,  25, 100,  22,  15,  21]], dtype=int64),
 array([[ 0.98304724,  0.9903958 ,  0.99536581,  0.99604388,  0.99706035,
      0.99749375,  0.99768032,  0.99778807,  0.99779205,  0.99783219,
      0.99822192,  0.9982969 ,  0.99831123,  0.99840337,  0.99849419,
      0.99858861,  0.99861923,  0.99863749,  0.99865913,  0.99875224]]))

余弦距离非常大>;0.983

事实上,我运行了20多个片段,其中大多数的余弦距离大于0.983

结果还好吗?我错过什么了吗?sklearn是否正确计算余弦距离(和余弦相似性)?在

请帮忙。在


Tags: ofnumpy距离dftypewithneighborsout