回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我又一次发现自己被熊猫困住了,以及如何最好地执行“向量操作”。我的代码可以工作,但是遍历所有内容需要很长时间。你知道吗</p>
<p>代码试图做的是循环遍历<code>shapes.cv</code>,确定哪个<code>shape_pt_sequence</code>是<code>stop_id</code>,然后将<code>stop_lat</code>和<code>stop_lon</code>分配给<code>shape_pt_lat</code>和<code>shape_pt_lon</code>,同时将<code>shape_pt_sequence</code>标记为<code>is_stop</code>。你知道吗</p>
<p>吉斯特</p>
<p><code>stop_times.csv</code><a href="https://gist.github.com/adampitchie/127a960db410f38fc747" rel="nofollow">LINK</a></p>
<p><code>trips.csv</code><a href="https://gist.github.com/adampitchie/25d45380e018b7d5c387" rel="nofollow">LINK</a></p>
<p><code>shapes.csv</code><a href="https://gist.github.com/adampitchie/9a2908efdcd580bc59b2" rel="nofollow">LINK</a></p>
<p>这是我的密码:</p>
<pre><code>import pandas as pd
from haversine import *
'''
iterate through shapes and match stops along a shape_pt_sequence within
x amount of distance. for shape_pt_sequence that is closest, replace the stop
lat/lon to the shape_pt_lat/shape_pt_lon, and mark is_stop column with 1.
'''
# readability assignments for shapes.csv
shapes = pd.read_csv('csv/shapes.csv')
shapes_index = list(set(shapes['shape_id']))
shapes_index.sort(key=int)
shapes.set_index(['shape_id', 'shape_pt_sequence'], inplace=True)
# readability assignments for trips.csv
trips = pd.read_csv('csv/trips.csv')
trips_index = list(set(trips['trip_id']))
trips.set_index(['trip_id'], inplace=True)
# readability assignments for stops_times.csv
stop_times = pd.read_csv('csv/stop_times.csv')
stop_times.set_index(['trip_id','stop_sequence'], inplace=True)
print(len(stop_times.loc[1423492]))
# readability assginments for stops.csv
stops = pd.read_csv('csv/stops.csv')
stops.set_index(['stop_id'], inplace=True)
# for each trip_id
for i in trips_index:
print('******NEW TRIP_ID******')
print(i)
i = i.astype(int)
# for each stop_sequence in stop_times
for x in range(len(stop_times.loc[i])):
stop_lat = stop_times.loc[i,['stop_lat','stop_lon']].iloc[x,[0,1]][0]
stop_lon = stop_times.loc[i,['stop_lat','stop_lon']].iloc[x,[0,1]][1]
stop_coordinate = (stop_lat, stop_lon)
print(stop_coordinate)
# shape_id that matches trip_id
print('**SHAPE_ID**')
trips_shape_id = trips.loc[i,['shape_id']].iloc[0]
trips_shape_id = int(trips_shape_id)
print(trips_shape_id)
smallest = 0
for y in range(len(shapes.loc[trips_shape_id])):
shape_lat = shapes.loc[trips_shape_id].iloc[y,[0,1]][0]
shape_lon = shapes.loc[trips_shape_id].iloc[y,[0,1]][1]
shape_coordinate = (shape_lat, shape_lon)
haversined = haversine_mi(stop_coordinate, shape_coordinate)
if smallest == 0 or haversined < smallest:
smallest = haversined
smallest_shape_pt_indexer = y
else:
pass
print(haversined)
print('{0:.20f}'.format(smallest))
print('{0:.20f}'.format(smallest))
print(smallest_shape_pt_indexer)
# mark is_stop as 1
shapes.iloc[smallest_shape_pt_indexer,[2]] = 1
# replace coordinate value
shapes.loc[trips_shape_id].iloc[y,[0,1]][0] = stop_lat
shapes.loc[trips_shape_id].iloc[y,[0,1]][1] = stop_lon
shapes.to_csv('csv/shapes.csv', index=False)
</code></pre>