<p>首先使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html" rel="nofollow noreferrer">^{<cd1>}</a>将<code>date</code>列转换为<code>datetime</code>系列:</p>
<pre><code>df['date'] = pd.to_datetime(df['date'], dayfirst=True)
</code></pre>
<p><strong>然后使用:</strong></p>
<pre><code>g = df.groupby('index')['date'].diff().dt.days.ne(1).cumsum() # STEP A
m = df.groupby(['index', g])['hats'].transform('max').eq(df['hats']) # STEP B
df = df.assign(high_hats=df['hats'].mask(~m), high_date=df['date'].mask(~m)) # STEP C
dct = {'start_date': ('date', 'first'), 'end_date': ('date', 'last'), 'high_hat': ('hats', 'max'),
'high_hat_date': ('high_date', 'first'), 'num_hats': ('high_hats', 'count')}
df1 = df.groupby(['index', g]).agg(**dct).reset_index().drop('date', 1) # STEP D
</code></pre>
<p><strong>详细信息:</strong></p>
<p>步骤A:使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd4>}</a>on <code>index</code>和<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.diff.html" rel="nofollow noreferrer">^{<cd6>}</a>on <code>date</code>来计算连续日期之间经过的天数,然后使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.days.html" rel="nofollow noreferrer">^{<cd8>}</a>+<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cumsum.html" rel="nofollow noreferrer">^{<cd9>}</a>和<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cumsum.html" rel="nofollow noreferrer">^{<cd10>}</a>来创建分组序列<code>g</code>,这将需要在连续日期对数据帧进行分组</p>
<pre><code># print(g)
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 3
8 3
9 4
Name: date, dtype: int64
</code></pre>
<p>步骤B:在{<cd5>}和{<cd11>}上使用{a2}并使用{a8}转换列{<cd16>},使用{<cd17>}然后使用{a9}将其与{<cd16>}列等同以创建布尔掩码{<cd20>}</p>
<pre><code># print(m)
0 False
1 False
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 True
Name: hats, dtype: bool
</code></pre>
<p>步骤C:接下来使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html" rel="nofollow noreferrer">^{<cd21>}</a>分配两个新列<code>high_hats</code>和<code>high_date</code>,它们将在<code>STEP D</code>中用于计算<code>high_hat_date</code>和<code>num_hats</code></p>
<pre><code># print(df)
index date hats high_hats high_date
0 A1 2020-01-01 5 NaN NaT
1 A1 2020-01-02 10 NaN NaT
2 A1 2020-01-03 16 16.0 2020-01-03
3 A1 2020-01-04 16 16.0 2020-01-04
4 A1 2020-01-21 9 9.0 2020-01-21
5 A1 2020-01-22 8 NaN NaT
6 A1 2020-01-23 7 NaN NaT
7 A6 2020-03-20 5 5.0 2020-03-20
8 A6 2020-03-21 5 5.0 2020-03-21
9 A8 2020-07-30 12 12.0 2020-07-30
</code></pre>
<p>步骤D:在{<cd5>}和{<cd11>}上使用{a2},并使用聚合字典{<cd30>}聚合数据帧,该字典包含所有要应用的列及其相应的{<cd31>}函数</p>
<pre><code># print(df1)
index start_date end_date high_hat high_hat_date num_hats
0 A1 2020-01-01 2020-01-04 16 2020-01-03 2
1 A1 2020-01-21 2020-01-23 9 2020-01-21 1
2 A6 2020-03-20 2020-03-21 5 2020-03-20 2
3 A8 2020-07-30 2020-07-30 12 2020-07-30 1
</code></pre>