我有一个熊猫数据帧a,带有纬度经度
import pandas as pd
df_a = pd.DataFrame([['b',1.591797,103.857887],
['c',1.589416, 103.865322]],
columns = ['place','lat','lng'])
我有另一个位置B的数据帧,也有纬度经度
df_b = pd.DataFrame([['ref1',1.594832, 103.853703],
['ref1',1.589749, 103.864678]],
columns = ['place','lat','lng'])
对于A中的每一行,我想找到B中最接近的匹配行(受距离限制)。 --&燃气轮机;我已经有了一个计算两对GPS之间距离的函数
预期输出
# a list where each row is the corresponding closest index in B
In [13]: min_index_arr
Out[13]: [0, 1]
一种方法是:
def haversine(pair1, pair2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
lon1, lat1 = pair1
lon2, lat2 = pair2
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
return c * r
import operator
min_vals = []
for i in df_a.index:
pair1 = df_a['lat'][i], df_a['lng'][i]
dist_array = []
for j in df_b.index:
pair2 = df_b['lat'][j], df_b['lng'][j]
dist = haversine(pair1, pair2)
dist_array.append(dist)
min_index, min_value = min(enumerate(dist_array), key=operator.itemgetter(1))
min_vals.append(max_index)
但我相信有一种更快的方法可以做到这一点,它似乎非常类似于外部产品,除了不是产品,而是使用功能。有人知道怎么做吗
使用来自KDTree for longitude/latitude的方法
基于sklearn.balltree
代码
输出
相关问题 更多 >
编程相关推荐