我有一个数据帧df
,其结构如下:dataId, nodeId, tickDatetime
此数据集表示元素(dataId
)通过节点(nodeId
)的时间(tickDatetime
)
以下是一个例子:
dataId nodeId tickDatetime
0 data-0 node-01 3000
1 data-0 node-02 5000
2 data-1 node-02 4000
3 data-1 node-01 6000
4 data-0 node-01 8000
5 data-0 node-00 10000
... ... ...
从这个数据帧,我想创建一个新的数据帧routes
,它将包含每个dataId
的节点序列和旅行时间
因此,我做了以下工作:
routes = df.sort_values('tickDatetime').groupby('dataId').agg({'nodeId':[lambda x: list(x)],'tickDatetime':lambda x: list(x)})
def datetimes_to_travel_times(datetimes):
traveltimes = np.empty(len(datetimes))
old_value = datetimes[0]
traveltimes[0] = 0
for i in range(1,len(datetimes)):
traveltimes[i] = datetimes[i] - old_value
old_value = datetimes[i]
return traveltimes
routes['traveltimes'] = routes['tickDatetime'].apply(lambda row: datetimes_to_travel_times(row))
这给了我预期的输出(可能不是最好的方法?)
dataId nodeId tickDatetime traveltimes
0 data-0 [node-01,node-02,node-01,node-00] [3000,5000,8000,10000] [0,2000,3000,2000]
1 data-1 [node-02,node-01] [4000,6000] [0,2000]
现在,如果旅行时间超过某个阈值,我希望我的路线被分割
例如,阈值为3000时,我希望我的routes
数据帧如下所示:
dataId routeId nodeId tickDatetime traveltimes
0 data-0 0 [node-01,node-02] [3000,5000] [0,2000]
1 data-0 1 [node-01,node-00] [8000,10000] [0,2000]
2 data-1 0 [node-02,node-01] [4000,6000] [0,2000]
我如何使用熊猫来实现这一点
编辑:
我设法解决了我的问题:
def split_routes(row):
threshold = 3000
nodes = row['nodeId']
traveltimes = row['traveltimes']
rows = []
route_id = 0
route_nodes = []
route_traveltimes = []
for i in range(0, len(traveltimes)):
if(traveltimes[i]<threshold):
route_nodes.append(nodes[i])
route_traveltimes.append(traveltimes[i])
else :
# Route route_id completed, starting a new one
row['route_id'] = route_id
row['Reader'] = route_nodes
row['traveltimes'] = route_traveltimes
rows.append(row)
route_id+=1
route_nodes.append(nodes[i])
route_traveltimes.append(0)
# Route route_id completed, starting a new one
row['route_id'] = route_id
row['Reader'] = route_nodes
row['traveltimes'] = route_traveltimes
rows.append(row)
return pd.DataFrame(rows)
splitted_routes_array = []
for index, row in routes.iterrows():
splitted_routes_array.append(split_routes(row))
splitted_routes = pd.concat(splitted_routes_array)
输出:
相关问题 更多 >
编程相关推荐