<p>下面是同一函数的矢量化numpy版本:</p>
<pre><code>import numpy as np
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
</code></pre>
<p>输入都是值的数组,它应该能够立即完成数百万个点。要求输入是ndarrays,但是pandas表的列可以工作。</p>
<p>例如,对于随机生成的值:</p>
<pre><code>>>> import numpy as np
>>> import pandas
>>> lon1, lon2, lat1, lat2 = np.random.randn(4, 1000000)
>>> df = pandas.DataFrame(data={'lon1':lon1,'lon2':lon2,'lat1':lat1,'lat2':lat2})
>>> km = haversine_np(df['lon1'],df['lat1'],df['lon2'],df['lat2'])
</code></pre>
<p>或者如果要创建另一列:</p>
<pre><code>>>> df['distance'] = haversine_np(df['lon1'],df['lat1'],df['lon2'],df['lat2'])
</code></pre>
<p>在python中,遍历数据数组的速度非常慢。Numpy提供了对整个数据数组进行操作的函数,这样可以避免循环并显著提高性能。</p>
<p>这是<a href="http://en.wikipedia.org/wiki/Array_programming" rel="noreferrer">vectorization</a>的一个例子。</p>