<ol>
<li>如果<code>start_time</code>是上面行的<code>greater than</code>{<cd4>}(即重叠),则可以使用<code>shift()</code>创建组</李>
<li>我们使用<code>fillna</code>和<code>'24:00:00'</code>来返回第一个值的'True',因为一天内任何值都不能超过24小时。这是因为<code>NaN</code>是第一行中带有<code>shift()</code>的输出,如果我们不这样做,它将返回<code>False</code></李>
<li>它返回一个由{<cd11>}和{<cd9>}组成的<code>boolean</code>序列(即分别为<code>1</code>和<code>0</code>),因此只需取<code>cumsum</code>的累积和</李>
<li>这将创建一个<code>grp</code>对象,我们可以将其包含在<code>groupby</code>中</李>
</ol>
<hr/>
<pre><code>df = df.sort_values(by=['padel', 'start_time'], ascending=[True, True])
grp = df['start_time'].gt(df['end_time'].shift().fillna('24:00:00')).cumsum()
df = df.groupby([grp, 'padel'], as_index=False).agg({'start_time':'first', 'end_time':'last'})
df['duration'] = ((pd.to_timedelta(df['end_time']) -
pd.to_timedelta(df['start_time'])).dt.seconds / 60).astype(int)
Out[1]:
padel start_time end_time duration
0 Padel 10 08:00:00 09:00:00 60
1 Padel 10 10:00:00 13:00:00 180
2 Padel 10 16:00:00 22:00:00 360
</code></pre>
<hr/>
<p><strong>带有输入数据帧的完整代码</strong></p>
<pre><code>df = pd.DataFrame(pd.DataFrame({'padel': {38: 'Padel 10',
40: 'Padel 10',
42: 'Padel 10',
44: 'Padel 10',
46: 'Padel 10',
49: 'Padel 10',
51: 'Padel 10',
53: 'Padel 10',
55: 'Padel 10',
57: 'Padel 10',
59: 'Padel 10',
61: 'Padel 10',
63: 'Padel 10',
65: 'Padel 10',
67: 'Padel 10'},
'start_time': {38: '08:00:00',
40: '10:00:00',
42: '10:30:00',
44: '11:00:00',
46: '11:30:00',
49: '16:00:00',
51: '16:30:00',
53: '17:00:00',
55: '17:30:00',
57: '18:00:00',
59: '18:30:00',
61: '19:00:00',
63: '19:30:00',
65: '20:00:00',
67: '20:30:00'},
'end_time': {38: '09:00:00',
40: '11:30:00',
42: '12:00:00',
44: '12:30:00',
46: '13:00:00',
49: '17:30:00',
51: '18:00:00',
53: '18:30:00',
55: '19:00:00',
57: '19:30:00',
59: '20:00:00',
61: '20:30:00',
63: '21:00:00',
65: '21:30:00',
67: '22:00:00'},
'duration': {38: 60,
40: 90,
42: 90,
44: 90,
46: 90,
49: 90,
51: 90,
53: 90,
55: 90,
57: 90,
59: 90,
61: 90,
63: 90,
65: 90,
67: 90}}))
grp = df['start_time'].gt(df['end_time'].shift().fillna('24:00:00')).cumsum()
df = df.groupby([grp, 'padel'], as_index=False).agg({'start_time':'first', 'end_time':'last'})
df['duration'] = ((pd.to_timedelta(df['end_time']) - \
pd.to_timedelta(df['start_time'])).dt.seconds / 60).astype(int)
df
</code></pre>