如何使用sklearn KNearest邻居获得1:1对应匹配

dfA = pd.DataFrame(np.array([[1, 1, 1, 1], [1,1,2,2], [4, 5, 2, 0], [8, 8, 8, 8]]), columns=['interest0', 'interest2', 'interest3','interest4'], index=['personA0','personA1','personA2','personA3']) dfB = pd.DataFrame(np.array([[1, 1, 1, 1], [1, 1, 1, 2], [2,3,2,2], [8, 6, 8, 8]]), columns=['interest0', 'interest2', 'interest3','interest4'], index=['personB0','personB1','personB2','personB3']) knn = NearestNeighbors(n_neighbors = 1, metric = my_dist).fit(dfA) distances, indices = knn.kneighbors(dfB) >>> dfA drink interest2 interest3 interest4 personA0 1 1 1 1 personA1 1 1 2 2 personA2 4 5 2 0 personA3 8 8 8 8 >>> dfB drink interest2 interest3 interest4 personB0 1 1 1 1 personB1 1 1 1 2 personB2 2 3 2 2 personB3 8 6 8 8 >>> print("Distances\n\n", distances, "\n\nIndices\n\n", indices) Distances [[0. ] [0.125] [1.125] [0.5 ]] Indices [[0] [0] [1] [3]]

1条回答

网友

1楼 · 发布于 2024-09-19 23:36:46

您可以使用列表来检查一个人是否匹配。此外，您需要通过更改传递给参数n_neighbors的值，获得按距离排序的邻居列表，而不是最近的邻居

knn = NearestNeighbors(n_neighbors=len(dfB)).fit(dfB)
distances, indices = knn.kneighbors(dfA)

matched = []
pairs = []
for indexA, candidatesB in enumerate(indices):
    personA = dfA.index[indexA]
    for indexB in candidatesB:
        if indexB not in matched:
            matched.append(indexB)
            personB = dfB.index[indexB]
            pairs.append([personA, personB])
            break

matches = pd.DataFrame(pairs, columns=['SetA', 'SetB'])

生成的数据帧如下所示：

       SetA      SetB
0  personA0  personB0
1  personA1  personB1
2  personA2  personB2
3  personA3  personB3

请注意，我使用了默认度量（p=2的minkowski）。如果将metric=my_dist传递给NearestNeighbors，结果可能会有所不同

相关问题更多 >

编程相关推荐

热门问题

热门文章