<p>我将数据按7天分组,累计和在<code>VIEWS_CUM_BEFORE</code>列中。在</p>
<h2>只有一列溶液或</h2>
<pre><code>df = df.drop(['VIEWS_CUM'], axis=1)
df['VIEWS_CUM_BEFORE'] = df.groupby([pd.Grouper(freq='7D',key='DAY'),'GROUP']).cumsum()
</code></pre>
<h2>定义cumsum解决方案列或</h2>
^{pr2}$
<h2>小茴香溶液</h2>
<pre><code>df['VIEWS_CUM_BEFORE'] = df.groupby([pd.Grouper(freq='7D',key='DAY'),'GROUP'])['VIEWS'].apply(np.cumsum)
</code></pre>
<p>但是<code>cumsum</code>计算第一个子组,并且需要<code>0</code>值它们。在</p>
<pre><code> GROUP DAY VIEWS VIEWS_CUM_BEFORE
0 1 2011-09-18 82 82
1 1 2011-09-19 15 97
2 1 2011-12-21 29 29
3 1 2011-12-22 15 44
4 1 2011-12-23 2 46
5 2 2012-01-07 51 51
6 2 2012-01-08 10 10
7 2 2012-01-09 11 21
8 2 2012-01-17 33 33
9 2 2012-01-18 29 62
10 2 2012-01-19 6 68
</code></pre>
<p>我们必须找到组的最小<code>DAY</code>,加上7天,如果这一天较短,则将其设为0。在</p>
<pre><code>def repeat_value(grp):
grp['DAY2'] = grp['DAY'].min() + pd.Timedelta('7 days')
return grp
df = df.groupby(['GROUP']).apply(repeat_value)
print df
</code></pre>
<pre><code> GROUP DAY VIEWS VIEWS_CUM_BEFORE DAY2
0 1 2011-09-18 82 82 2011-09-25
1 1 2011-09-19 15 97 2011-09-25
2 1 2011-12-21 29 29 2011-09-25
3 1 2011-12-22 15 44 2011-09-25
4 1 2011-12-23 2 46 2011-09-25
5 2 2012-01-07 51 51 2012-01-14
6 2 2012-01-08 10 10 2012-01-14
7 2 2012-01-09 11 21 2012-01-14
8 2 2012-01-17 33 33 2012-01-14
9 2 2012-01-18 29 62 2012-01-14
10 2 2012-01-19 6 68 2012-01-14
df.loc[df['DAY2'] > df['DAY'], 'VIEWS_CUM_BEFORE'] = 0
del df['DAY2']
print df
</code></pre>
<pre><code> GROUP DAY VIEWS VIEWS_CUM_BEFORE
0 1 2011-09-18 82 0
1 1 2011-09-19 15 0
2 1 2011-12-21 29 29
3 1 2011-12-22 15 44
4 1 2011-12-23 2 46
5 2 2012-01-07 51 0
6 2 2012-01-08 10 0
7 2 2012-01-09 11 0
8 2 2012-01-17 33 33
9 2 2012-01-18 29 62
10 2 2012-01-19 6 68
</code></pre>