擅长:python、mysql、java
<p>我认为您需要计算df1和df2中歌曲列表之间的相似性度量。我通过计算随机生成的歌曲列表中df1和df2中歌曲之间的余弦距离进行了尝试。在</p>
<pre><code>from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(min_df=1)
Song1 = ["macarena bayside boys mix", "cant you hear my heart beat", "crying in the chapell", "you were on my mind"]
Song2 = ["cause im a man", "macarena", "beat from my heart"]
dist_dict = {}
match_dict = {}
for i in Song1 :
for j in Song2 :
tfidf = vect.fit_transform([i, j])
distance = ((tfidf * tfidf.T).A)[0,1]
if i in dist_dict.keys():
if dist_dict[i] < distance :
dist_dict[i] = distance
match_dict[i] = j
else :
dist_dict[i] = distance
</code></pre>
<p><a href="https://i.stack.imgur.com/c8q9L.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/c8q9L.png" alt="Best match and their cosine distance"/></a></p>
<p>一旦找到了最佳匹配项,就可以在df2中查找歌曲ID</p>