<p>通过使用数据帧的索引,可以避免<code>groupby()</code>。你知道吗</p>
<p><强>1。使用范围索引</strong></p>
<p>假设以下数据帧:</p>
<pre><code>import pandas as pd
df = pd.DataFrame({
'SalesDate': ['2016-12-20', '2016-12-21', '2016-12-22', '2016-12-23', '2016-12-24', '2016-12-25', '2016-12-26', '2016-12-27'],
'holiday': [0, 0, 0, 0, 0, 1, 0, 0]
})
df
SalesDate holiday
0 2016-12-20 0
1 2016-12-21 0
2 2016-12-22 0
3 2016-12-23 0
4 2016-12-24 0
5 2016-12-25 1
6 2016-12-26 0
7 2016-12-27 0
</code></pre>
<p><strong>单线解决方案:</strong></p>
<p>如果数据帧使用标准RangeIndex,则可以使用<code>df.index.where().bfill()</code>和<code>df.index</code>进行算术运算:</p>
<pre><code>df['next_holiday'] = pd.Series(df.index.where(df.holiday == 1), dtype='Int32').fillna(method='bfill') - df.index # Note: dtype'Int32' nullable int is new in pandas 0.24
</code></pre>
<p>结果:</p>
<pre><code>df
SalesDate holiday next_holiday
0 2016-12-20 0 5
1 2016-12-21 0 4
2 2016-12-22 0 3
3 2016-12-23 0 2
4 2016-12-24 0 1
5 2016-12-25 1 0
6 2016-12-26 0 NaN
7 2016-12-27 0 NaN
</code></pre>
<p><强>2。使用DateTimeIndex</strong></p>
<p>如果<code>SalesDate</code>是索引列(<code>datetime64</code>类型),则解决方案类似:</p>
<pre><code>df = df.set_index(pd.to_datetime(df.SalesDate)).drop(columns=['SalesDate'])
df
holiday
SalesDate
2016-12-20 0
2016-12-21 0
2016-12-22 0
2016-12-23 0
2016-12-24 0
2016-12-25 1
2016-12-26 0
2016-12-27 0
</code></pre>
<p><strong>用日期算法求解:</strong></p>
<pre><code>df['next_holiday'] = ((pd.Series(df.index.where(df.holiday == 1)).fillna(method='bfill') - df.index) / np.timedelta64(1, 'D'))
df['next_holiday'] = df['next_holiday'].astype('Int32') # pandas >= 0.24 for the nullable integer cast
</code></pre>
<p>结果:</p>
<pre><code>df
holiday next_holiday
SalesDate
2016-12-20 0 5
2016-12-21 0 4
2016-12-22 0 3
2016-12-23 0 2
2016-12-24 0 1
2016-12-25 1 0
2016-12-26 0 NaN
2016-12-27 0 NaN
</code></pre>