<p>以下代码适用于您提供的区块数据。如果它在您的实际数据中不起作用,请让我知道。也许有更好的方法可以做到这一点,但我认为这是一个很好的起点</p>
<p>这里的总体思路是按乘客分组以确定路线。然后,因为你想要每日平均值,你需要按日期分组,然后按目的地分组,以计算每日平均值</p>
<pre><code># Define a function to get routes' relationship (origin vs destination)
def get_routes(x):
if 'transfer' not in x.type.tolist(): # if no 'transfer' type in group, leave it as 0 (we'll remove them afterwards)
return 0
x = x[x.type == 'transfer'] # select target type
date = df[df.cardNumber=='999'].Date.dt.strftime('%m/%d/%Y').unique()
if date.size == 1: # if there is more than one date by passenger, you'll need to change this code
date = date[0]
else:
raise Exception("There are more than one date per passenger, please adapt your code.")
s_from = x.routeName[x.Date.idxmin()] # get route from the first date
s_to = x.routeName[x.Date.idxmax()] # get route from the last date
return date, s_from, s_to
# Define a function to get the routes' daily average
def get_daily_avg(date_group):
daily_avg = (
date_group.groupby(['From', 'To'], as_index=False) # group the day by routes
.apply(lambda route: route.shape[0] / date_group.shape[0]) # divide the total of trips of that route by the total trips of that day
)
return daily_avg
# Get route's relationship
routes_series = df.groupby('cardNumber').apply(get_routes) # retrive routes per passenger
routes_series = routes_series[routes_series!=0] # remove groups without the target type
# Create a named dataframe from the series output
routes_df = pd.DataFrame(routes_series.tolist(), columns=['Date', 'From', 'To'])
# Create dataframe, perform filter and calculations
daily_routes_df = (
routes_df.query('From != To') # remove routes with same destination as the origin
.groupby('Date').apply(get_daily_avg) # calculate the mean per date
.rename(columns={None: 'Avg. Daily'}) # set name to previous output
.drop(['From','To'], axis = 1) # drop out redundant info since there's such info at the index
.reset_index() # remove MultiIndex to get a tidy dataframe
)
# Visualize results
print(daily_routes_df)
</code></pre>
<p>输出:</p>
<pre><code> Date From To Avg. Daily
0 08/01/2020 2 1 1.0
</code></pre>
<p>这里的平均值是1,因为每组只有一个计数。请注意,只考虑了“转移”类型。没有它的,或者没有改变路线的,被进一步移除</p>