<p>首先使用to_datetime和<code>astype</code>(我将其标记为<code>'anchor_date'</code>)将<code>training_date</code>转换为当月的第一个。然后,我们设置索引并将列转换为<code>datetime</code>数据类型和堆栈,为我们在下一步中计算时间差提供了一种简单的方法</p>
<pre><code>import pandas as pd
import numpy as np
# Make datetime and then turn value into first of the month
df['training_date'] = pd.to_datetime(df['training_date'])
df['date_anchor'] = df.training_date.astype('datetime64[M]')
df = df.set_index(['salesman', 'training_date', 'date_anchor'])
df.columns = pd.Index(pd.to_datetime(df.columns, format='%m/%y'), name='date')
df = df.stack().reset_index()
# salesman training_date date_anchor date 0
#0 John 2020-11-30 2020-11-01 2020-01-01 100
#1 John 2020-11-30 2020-11-01 2020-02-01 20
#2 John 2020-11-30 2020-11-01 2020-03-01 200
#3 John 2020-11-30 2020-11-01 2020-04-01 250
#...
#19 Ruddy 2020-07-12 2020-07-01 2020-08-01 10
#20 Ruddy 2020-07-12 2020-07-01 2020-09-01 20
#21 Ruddy 2020-07-12 2020-07-01 2020-10-01 0
#22 Ruddy 2020-07-12 2020-07-01 2020-11-01 20
#23 Ruddy 2020-07-12 2020-07-01 2020-12-01 100
</code></pre>
<p>现在我们需要计算两者之间的整数月数,这可以通过一些数学来实现,并在将来使用<code>np.select</code>到NaN个月并设置标签。最后,以数据帧为轴心</p>
<pre><code>df['months'] = ((df.date.dt.year - df.date_anchor.dt.year) * 12
+ (df.date.dt.month - df.date_anchor.dt.month))
df['months'] = np.select([df.months.eq(0), df.months.lt(0)],
['training_month', df.months.abs().astype(str) + 'm_prior'],
df.months.abs().astype(str) + 'm_post')
df = (df.pivot_table(index=['salesman', 'training_date'], columns='months', values=0)
.rename_axis(columns=None)
.reset_index())
</code></pre>
<hr/>
<pre><code> salesman training_date 10m_prior 1m_post 1m_prior 2m_post 2m_prior 3m_post 3m_prior 4m_post 4m_prior 5m_post 5m_prior 6m_prior 7m_prior 8m_prior 9m_prior training_month
0 John 2020-11-30 100.0 250.0 100.0 NaN 150.0 NaN 30.0 NaN 80.0 NaN 28.0 0.0 250.0 200.0 20.0 300.0
1 Ruddy 2020-07-12 NaN 10.0 100.0 20.0 300.0 0.0 225.0 20.0 30.0 100.0 50.0 90.0 NaN NaN NaN 95.0
</code></pre>