<p>修改<a href="https://stackoverflow.com/a/54660816/1840471">this answer</a>以避免使用全距离矩阵,您可以在<code>XA</code>(<code>nearest_record1()</code>)中找到每一行最近的记录和距离,然后调用<code>apply</code>在每一行(<code>nearest_record()</code>)上遍历它。这在<a href="https://colab.research.google.com/drive/18pRjKi44PZNEZfYVR7XAwiCHQ4ixUs9l" rel="nofollow noreferrer">test</a>中将运行时间缩短了约85%</p>
<pre><code>from scipy.spatial.distance import cdist
def nearest_record1(XA1, XB):
"""Get the nearest record between XA1 and XB.
Args:
XA: Series.
XB: DataFrame.
Returns:
DataFrame with columns for id_B (from XB) and dist.
"""
dist = cdist(XA1.values.reshape(1, -1), XB)[0]
return pd.Series({'dist': np.amin(dist), 'id_B': np.argmin(dist)})
def nearest_record(XA, XB):
"""Get the nearest record in XA for each record in XB.
Args:
XA: DataFrame. Each record is matched against the nearest in XB.
XB: DataFrame.
Returns:
DataFrame with columns for id_A (from XA), id_B (from XB), and dist.
Each id_A maps to a single id_B, which is the nearest record from XB.
"""
res = XA.apply(lambda x: nearest_record1(x, XB), axis=1)
res['id_A'] = XA.index
# id_B is sometimes returned as an object.
res['id_B'] = res.id_B.astype(int)
# Reorder columns.
return res[['id_A', 'id_B', 'dist']]
</code></pre>
<p>这也会返回正确的结果:</p>
<pre><code>nearest_record(XA, XB)
id_A id_B dist
0 0 2 5.099020
1 1 2 4.472136
2 2 2 4.242641
</code></pre>