使用pandas在同一索引的列中查找连续几天的开始和结束日期问题的回答

使用pandas在同一索引的列中查找连续几天的开始和结束日期

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

首先使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html" rel="nofollow noreferrer">^{<cd1>}</a>将<code>date</code>列转换为<code>datetime</code>系列： <pre><code>df['date'] = pd.to_datetime(df['date'], dayfirst=True) </code></pre> 然后使用： <pre><code>g = df.groupby('index')['date'].diff().dt.days.ne(1).cumsum() # STEP A m = df.groupby(['index', g])['hats'].transform('max').eq(df['hats']) # STEP B df = df.assign(high_hats=df['hats'].mask(~m), high_date=df['date'].mask(~m)) # STEP C dct = {'start_date': ('date', 'first'), 'end_date': ('date', 'last'), 'high_hat': ('hats', 'max'), 'high_hat_date': ('high_date', 'first'), 'num_hats': ('high_hats', 'count')} df1 = df.groupby(['index', g]).agg(**dct).reset_index().drop('date', 1) # STEP D </code></pre> 详细信息： 步骤A：使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd4>}</a>on <code>index</code>和<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.diff.html" rel="nofollow noreferrer">^{<cd6>}</a>on <code>date</code>来计算连续日期之间经过的天数，然后使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.days.html" rel="nofollow noreferrer">^{<cd8>}</a>+<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cumsum.html" rel="nofollow noreferrer">^{<cd9>}</a>和<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cumsum.html" rel="nofollow noreferrer">^{<cd10>}</a>来创建分组序列<code>g</code>，这将需要在连续日期对数据帧进行分组 <pre><code># print(g) 0 1 1 1 2 1 3 1 4 2 5 2 6 2 7 3 8 3 9 4 Name: date, dtype: int64 </code></pre> 步骤B：在{<cd5>}和{<cd11>}上使用{a2}并使用{a8}转换列{<cd16>}，使用{<cd17>}然后使用{a9}将其与{<cd16>}列等同以创建布尔掩码{<cd20>} <pre><code># print(m) 0 False 1 False 2 True 3 True 4 True 5 False 6 False 7 True 8 True 9 True Name: hats, dtype: bool </code></pre> 步骤C：接下来使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html" rel="nofollow noreferrer">^{<cd21>}</a>分配两个新列<code>high_hats</code>和<code>high_date</code>，它们将在<code>STEP D</code>中用于计算<code>high_hat_date</code>和<code>num_hats</code> <pre><code># print(df) index date hats high_hats high_date 0 A1 2020-01-01 5 NaN NaT 1 A1 2020-01-02 10 NaN NaT 2 A1 2020-01-03 16 16.0 2020-01-03 3 A1 2020-01-04 16 16.0 2020-01-04 4 A1 2020-01-21 9 9.0 2020-01-21 5 A1 2020-01-22 8 NaN NaT 6 A1 2020-01-23 7 NaN NaT 7 A6 2020-03-20 5 5.0 2020-03-20 8 A6 2020-03-21 5 5.0 2020-03-21 9 A8 2020-07-30 12 12.0 2020-07-30 </code></pre> 步骤D：在{<cd5>}和{<cd11>}上使用{a2}，并使用聚合字典{<cd30>}聚合数据帧，该字典包含所有要应用的列及其相应的{<cd31>}函数 <pre><code># print(df1) index start_date end_date high_hat high_hat_date num_hats 0 A1 2020-01-01 2020-01-04 16 2020-01-03 2 1 A1 2020-01-21 2020-01-23 9 2020-01-21 1 2 A6 2020-03-20 2020-03-21 5 2020-03-20 2 3 A8 2020-07-30 2020-07-30 12 2020-07-30 1 </code></pre>

使用pandas在同一索引的列中查找连续几天的开始和结束日期

1 个回答

相关Python问题