<p><strong>更新:</strong>我建议首先建立一个距离数据帧:</p>
<pre><code>from scipy.spatial.distance import squareform, pdist
from itertools import combinations
# see definition of "haversine_np()" below
x = pd.DataFrame({'dist':pdist(df[['lat','lng']], haversine_np)},
index=pd.MultiIndex.from_tuples(tuple(combinations(df['city'], 2))))
</code></pre>
<p>有效产生成对距离测向(无重复):</p>
^{pr2}$
<hr/>
<p><strong>旧答案:</strong></p>
<p>下面是一个位优化的版本,它使用<a href="https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist" rel="nofollow noreferrer">scipy.spatial.distance.pdist</a>方法:</p>
<pre><code>from scipy.spatial.distance import squareform, pdist
# slightly modified version: of http://stackoverflow.com/a/29546836/2901002
def haversine_np(p1, p2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lat1, lon1, lat2, lon2 = np.radians([p1[0], p1[1],
p2[0], p2[1]])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
x = pd.DataFrame(squareform(pdist(df[['lat','lng']], haversine_np)),
columns=df.city.unique(),
index=df.city.unique())
</code></pre>
<p>这给了我们:</p>
<pre><code>In [78]: x
Out[78]:
Berlin Potsdam Hamburg
Berlin 0.000000 27.198616 255.063541
Potsdam 27.198616 0.000000 242.311890
Hamburg 255.063541 242.311890 0.000000
</code></pre>
<p>让我们统计一下距离大于30的城市数量:</p>
<pre><code>In [81]: x.groupby(level=0, as_index=False) \
...: .apply(lambda c: c[c>30].notnull().sum(1)) \
...: .reset_index(level=0, drop=True)
Out[81]:
Berlin 1
Hamburg 2
Potsdam 1
dtype: int64
</code></pre>