我有以下资料:
输入df-
fruit uniqueid
apple 1123
appless 321
banana 623
mango 739
mangos 889
代码-
df.loc[:,'fruit_copy'] = df['fruit']
## comparing values from one column to each other
compare = pd.MultiIndex.from_product([df['fruit'],df['fruit_copy']]).to_series()
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
compare = compare.apply(metrics)
## only keep higher matches
compare_80 = compare[(compare['ratio'] >=80) & (compare['token'] >=80)]
电流输出-
ratio token
apple apple 100 100
appless 83 83
appless apple 83 83
appless 100 100
banana banana 100 100
mango mango 100 100
mangos 91 91
mangos mango 91 91
mangos 100 100
预期成果第一目标-
index1 index2 ratio token uniqueid
apple 1123 apple 100 100 1123
appless 83 83 321
appless 321 apple 83 83 1123
appless 100 100 321
banana 623 banana 100 100 632
mango 739 mango 100 100 739
mangos 91 91 889
mangos 889 mango 91 91 739
mangos 100 100 889
预期成果第二个目标-
index1 index2 ratio token uniqueid
apple 1123 appless 83 83 321
mango 739 mangos 91 91 889
我可以通过将uniqueid附加到多值索引来实现这一点吗
您可以稍后尝试通过交叉合并和应用模糊比率来执行此操作:
相关问题 更多 >
编程相关推荐