有没有一种更快的方法(在Python中,使用CPU)来完成与下面的函数相同的事情?我使用了For
循环和if
语句,想知道是否有更快的方法?目前,每100个邮政编码运行此功能大约需要1分钟,而我大约需要70000个邮政编码
使用的两个数据帧是:
postcode_df
,其中包含71092行和列:
例如
postcode_df = pd.DataFrame({"Postcode":["SK12 2LH", "SK7 6LQ"],
"Latitude":[53.362549, 53.373812],
"Longitude":[-2.061329, -2.120956]})
air
,其中包含421行和列:
例如
air = pd.DataFrame({"TubeRef":["Stkprt35", "Stkprt07", "Stkprt33"],
"Latitude":[53.365085, 53.379502, 53.407510],
"Longitude":[-2.0763, -2.120777, -2.145632]})
函数循环使用postcode_df中的每个邮政编码,对于每个邮政编码,循环使用每个TubeRef并计算(使用geopy
)它们之间的距离,并使用到邮政编码的最短距离保存TubeRef
输出dfpostcode_nearest_tube_refs
包含每个邮政编码最近的管,并包含列:
# define function to get nearest air quality monitoring tube per postcode
def get_nearest_tubes(constituency_list):
postcodes = []
nearest_tubes = []
distances_to_tubes = []
for postcode in postcode_df["Postcode"]:
closest_tube = ""
shortest_dist = 500
postcode_lat = postcode_df.loc[postcode_df["Postcode"]==postcode, "Latitude"]
postcode_long = postcode_df.loc[postcode_df["Postcode"]==postcode, "Longitude"]
postcode_coord = (float(postcode_lat), float(postcode_long))
for tuberef in air["TubeRef"]:
tube_lat = air.loc[air["TubeRef"]==tuberef, "Latitude"]
tube_long = air.loc[air["TubeRef"]==tuberef, "Longitude"]
tube_coord = (float(tube_lat), float(tube_long))
# calculate distance between postcode and tube
dist_to_tube = geopy.distance.distance(postcode_coord, tube_coord).km
if dist_to_tube < shortest_dist:
shortest_dist = dist_to_tube
closest_tube = str(tuberef)
# save postcode's tuberef with shortest distance
postcodes.append(str(postcode))
nearest_tubes.append(str(closest_tube))
distances_to_tubes.append(shortest_dist)
# create dataframe of the postcodes, nearest tuberefs and distance
postcode_nearest_tube_refs = pd.DataFrame({"Postcode":postcodes,
"Nearest Air Tube":nearest_tubes,
"Distance to Air Tube KM": distances_to_tubes})
return postcode_nearest_tube_refs
我正在使用的库包括:
import numpy as np
import pandas as pd
# !pip install geopy
import geopy.distance
这里是一个工作示例,以秒为单位(<;10)
导入库
我生成一些随机数据,这也需要一秒钟,但至少我们有一些实际的数据
并将UUID用于伪造邮政编码
我们对空气也是这样
再次使用uuid作为伪参考
将gps值提取为numpy
创建一个棒球树
查询最接近的第一个
请注意,距离不是以公里为单位,需要先进行转换
例如,使用
tube_df.ref[ index[:,0] ]
获取ref可以使用numpy计算集合a中任意点到集合B中任意点的距离矩阵,然后只取集合a中对应于最小距离的点
相关问题 更多 >
编程相关推荐