加速通过两个Pandas数据帧的嵌套for循环

import pandas as pd from math import cos, asin, sqrt R=5 lats = df['lat'] lons = df['lon'] for stop in areadf.itertuples(): for index in df.index: if getDistance(lats[index],lons[index], stop[1],stop[2]) < R: df.at[index,'stop_id'] = stop[0] # id df.at[index,'stoplat'] = stop[1] # lat df.at[index,'stoplon'] = stop[2] # lon def getDistance(lat1,lon1,lat2,lon2): p = 0.017453292519943295 #Pi/180 a = (0.5 - cos((lat2 - lat1) * p)/2 + cos(lat1 * p) * cos(lat2 * p) * (1 - cos((lon2 - lon1) * p)) / 2) return 12742 * asin(sqrt(a)) * 100

1条回答

网友

1楼 · 发布于 2024-09-26 17:52:29

一种方法是使用来自here的numpy haversine函数，只需稍作修改，就可以计算出所需的半径。在

只需使用apply遍历您的df，并在给定半径内找到最接近的值

def haversine_np(lon1, lat1, lon2, lat2,R):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)
    All args must be of equal length.    
    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    if km.min() <= R:
        return km.argmin()
    else:
        return -1

df['dex'] = df[['lat','lon']].apply(lambda row: haversine_np(row[1],row[0],areadf.stoplon.values,areadf.stoplat.values,1),axis=1)

然后合并两个数据帧。在

^{pr2}$

注意：如果选择使用此方法，则必须确保两个数据帧索引都已重置，或者它们是从0到df的总长度顺序排列的。所以在运行这个之前一定要重置索引。在

df.reset_index(drop=True,inplace=True)
areadf.reset_index(drop=True,inplace=True)

相关问题更多 >

编程相关推荐

热门问题

热门文章