<p>一种方法是计算全距离矩阵,然后<code>melt</code>它并使用<code>nsmallest</code>进行聚合,它返回索引和值:</p>
<pre><code>from scipy.spatial.distance import cdist
def nearest_record(XA, XB):
"""Get the nearest record in XA for each record in XB.
Args:
XA: DataFrame. Each record is matched against the nearest in XB.
XB: DataFrame.
Returns:
DataFrame with columns for id_A (from XA), id_B (from XB), and dist.
Each id_A maps to a single id_B, which is the nearest record from XB.
"""
dist = pd.DataFrame(cdist(XA, XB)).reset_index().melt('index')
dist.columns = ['id_A', 'id_B', 'dist']
# id_B is sometimes returned as an object.
dist['id_B'] = dist.id_B.astype(int)
dist.reset_index(drop=True, inplace=True)
nearest = dist.groupby('id_A').dist.nsmallest(1).reset_index()
return nearest.set_index('level_1').join(dist.id_B).reset_index(drop=True)
</code></pre>
<p>这表明<code>id_B</code>2是距离<code>XA</code>中三条记录最近的记录:</p>
<pre><code>nearest_record(XA, XB)
id_A dist id_B
0 0 5.099020 2
1 1 4.472136 2
2 2 4.242641 2
</code></pre>
<p>然而,由于这涉及到计算全距离矩阵,因此当<code>XA</code>和<code>XB</code>较大时,计算速度会很慢或失败。另一种为每行计算最近值的方法可能会更快</p>