Python Pandas：如何使用交通智能卡d将行程段组合成一段旅程

CustomerID SegmentID Origin Dest StartTime EndTime Fare Type 0 A001 101 A B 7:30am 7:45am 1.5 Bus 1 A001 102 B C 7:50am 8:30am 3.5 Train 2 A001 103 C B 17:10pm 18:00pm 3.5 Train 3 A001 104 B A 18:10pm 18:30pm 1.5 Bus 4 A002 105 K Y 11:30am 12:30pm 3.0 Train 5 A003 106 P O 10:23am 11:13am 4.0 Ferrie

1条回答

网友

1楼 · 发布于 2024-10-03 04:27:32

这里有一个相当完整的答案。你没有详细说明单程旅行的概念，所以我猜了一下。您可以调整下面的蒙版，以更好地符合您自己的定义。在

# get rid of am/pm and convert to proper datetime
# converts to year 1900 b/c it's not specified, doesn't matter here
df['StTime'] = pd.to_datetime( df.StartTime.str[:-2], format='%H:%M' )
df['EndTime'] = pd.to_datetime( df.EndTime.str[:-2], format='%H:%M' )

# some of the later processing is easier if you use duration
# instead of arrival time
df['Duration'] = df.EndTime-df.StTime

# get rid of some nuisance variables for clarity
df = df[['CustomerID','Origin','Dest','StTime','Duration','Fare','Type']]

首先，我们需要找出一种将行分组的方法。问题1中的“客户ID”中也没有指定时间。请注意，对于三种模式的出行，这实际上意味着第一次和第三次出行的开始时间可能相差超过一个小时，只要第一次+第二次和第二次+第三次分别小于1小时。这看起来是一种很自然的方法，但是对于实际的用例，您必须根据您想要的定义来调整它。有很多方法你可以在这里继续。在

^{pr2}$

现在我们可以使用cumsum的掩码来生成tripID：

df['JourneyID'] = 1
df.ix[mask,'JourneyID'] = 0
df['JourneyID'] = df['JourneyID'].cumsum()
df['NumTrips'] = 1

df[['CustomerID','StTime','Fare','JourneyID']]

  CustomerID              StTime  Fare  JourneyID
0       A001 1900-01-01 07:30:00   1.5          1
1       A001 1900-01-01 07:50:00   3.5          1
2       A001 1900-01-01 17:10:00   3.5          2
3       A001 1900-01-01 18:10:00   1.5          2
4       A002 1900-01-01 11:30:00   3.0          3
5       A003 1900-01-01 10:23:00   4.0          4

现在，对于每个列，只需适当地聚合：

df2 = df.groupby('JourneyID').agg({ 'Origin' : sum, 'CustomerID' : min,
                                    'Dest'   : sum, 'StTime'     : min,
                                    'Fare'   : sum, 'Duration'   : sum,
                                    'Type'   : sum, 'NumTrips'   : sum })

                      StTime Dest Origin Fare Duration     Type CustomerID NumTrips
JourneyID                                                                               
1        1900-01-01 07:30:00   BC     AB    5 00:55:00 BusTrain       A001        2
2        1900-01-01 17:10:00   BA     CB    5 01:10:00 TrainBus       A001        2
3        1900-01-01 11:30:00    Y      K    3 01:00:00    Train       A002        1
4        1900-01-01 10:23:00    O      P    4 00:50:00   Ferrie       A003        1

注意，持续时间仅包括行程时间，不包括行程之间的时间（例如，如果第二次行程的开始时间晚于第一次行程的结束时间）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章