In [67]: df1
Out[67]:
Song Artist
0 mysong myartist
1 like a virgi madonna
In [68]: df2
Out[68]:
Song Rank
0 mysong 1
1 like a virgin 2
In [69]: merged = pd.merge(df1, df2, on='Song')
In [70]: merged
Out[70]:
Song Artist Rank
0 mysong myartist 1
第2步:找出剩余的
^{pr2}$
第3步:使用difflib的get_close_matches获得最接近的匹配
^{3}$
第4步:如果需要,获取相似度百分比
In [77]: def similar(a, b):
...: return difflib.SequenceMatcher(None, a, b).ratio()
In [78]: unmerged['Similarity'] = unmerged.apply(lambda row: similar(row['closest_song'], row['Song']), axis=1)
In [79]: unmerged
Out[79]:
Song Rank closest_song Similarity
1 like a virgin 2.0 like a virgi 0.96
第5步:使用最接近的值合并
In [80]: unmerged.rename(columns={'Song': 'Old_Song', 'closest_song': 'Song'}, inplace=True)
In [81]: new = unmerged.merge(df1, on='Song')[['Song', 'Artist', 'Rank']]
Out[81]:
Song Artist Rank
0 like a virgi madonna 2.0
In [82]: pd.concat([merged, new])
Out[82]:
Song Artist Rank
0 mysong myartist 1.0
0 like a virgi madonna 2.0
第1步:合并任何可以合并的内容
第2步:找出剩余的
^{pr2}$第3步:使用difflib的
^{3}$get_close_matches
获得最接近的匹配第4步:如果需要,获取相似度百分比
第5步:使用最接近的值合并
相关问题 更多 >
编程相关推荐