将月与月转换为月与月之前

2024-10-03 17:21:47 发布

您现在位置:Python中文网/ 问答频道 /正文

初始表

salesman    training_date   01/20   02/20   03/20   04/20   05/20   06/20   07/20   08/20   09/20   10/20   11/20   12/20
0   John    2020-11-01       100      20     200     250      0       28      80      30     150     100     300    250
1   Ruddy   2020-07-12       90       50      30     225     300     100      95      10      20      0     20      100

在Python中:

t1 = {'salesman': ['John', 'Ruddy'],
     'training_date':['2020-11-30','2020-07-12'],
     '01/20': [100, 90], '02/20':[20,50], '03/20':[200,30],'04/20':[250,225],'05/20':[0,300],'06/20':[28,100],
     '07/20': [80, 95], '08/20':[30,10], '09/20':[150,20],'10/20':[100,0],'11/20':[300,20],'12/20':[250,100],
     }
t1a = pd.DataFrame(data=t1)
t1a

预期的数据帧:

    salesman    training_date   training_month  1m_prior    2m_prior    3m_prior    4m_prior    5m_prior    6m_prior
0   John          2020-11-30         300           100        150           30         80      28       0
1   Ruddy         2020-07-12          95           100        300          225         30      50      90

在Python中:

t2 = {'salesman': ['John', 'Ruddy'],
     'training_date':['2020-11-30','2020-07-12'],
     'training_month': [300, 95], '1m_prior':[100,100], '2m_prior':[150,300],
     '3m_prior':[30,225],'4m_prior':[80,30],'5m_prior':[28,50], '6m_prior': [0, 90]}
t2a = pd.DataFrame(data=t2)
t2a

说明:
约翰在11月1日接受训练。在11月1日之前,约翰在10月份赚了100万美元。 在11月1日,9月1日之前,约翰赚了150美元

鲁迪在7月12日接受训练。在7月12日之前,即6月,Ruddy赚了100万美元。 在5月12日之前的200万美元,Ruddy赚了300美元

在理想情况下,我们开始计算一个完整的月份,总是从每个月的1号开始。 所以,如果Ruddy是在2020年7月12日被雇佣的,那么一个月前应该是6月1日至6月30日

到目前为止,我们在Excel中手动转换数据


Tags: 数据dataframedatadatetrainingjohnpdt1
2条回答

熔化+枢轴

t1a['training_date'] = pd.to_datetime(t1a['training_date'])

如果您在6个月前确实需要:

u = t1a.melt(['salesman','training_date'])
diff_ = (pd.to_datetime(u['variable'],format='%m/%y').dt.month
           .sub(u['training_date'].dt.month))

u = u.assign(k=diff_).loc[lambda x: x['k'].between(-6,0)]
choices = (diff_.abs().astype(str).replace("0","").add("")+
            np.select([diff_<0,diff_==0],['months_prior','training_month']))

out = (u.assign(Key=choices).pivot_table(index=["salesman","training_date"]
,columns="Key",values="value").set_index("training_month",append=True))

print(out)
Key                                    1months_prior  2months_prior  \
salesman training_date training_month                                 
John     2020-11-30    300                       100            150   
Ruddy    2020-07-12    95                        100            300   

Key                                    3months_prior  4months_prior  \
salesman training_date training_month                                 
John     2020-11-30    300                        30             80   
Ruddy    2020-07-12    95                        225             30   

Key                                    5months_prior  6months_prior  
salesman training_date training_month                                
John     2020-11-30    300                        28              0  
Ruddy    2020-07-12    95                         50             90 

如果您想要所有月份,包括之前和之后:

u = t1a.melt(['salesman','training_date'])
diff_ = (pd.to_datetime(u['variable'],format='%m/%y').dt.month
           .sub(u['training_date'].dt.month))

choices = (diff_.abs().astype(str).replace("0","").add("")+
           np.select([diff_>0,diff_<0,diff_==0],
           ['months_later','months_prior','training_month']))

out = (u.assign(Key=choices).pivot_table(index=["salesman","training_date"]
   ,columns="Key",values="value").set_index("training_month",append=True))

print(out)

Key                                    10months_prior  1months_later  \
salesman training_date training_month                                  
John     2020-11-30    300.0                    100.0          250.0   
Ruddy    2020-07-12    95.0                       NaN           10.0   

Key                                    1months_prior  2months_later  \
salesman training_date training_month                                 
John     2020-11-30    300.0                   100.0            NaN   
Ruddy    2020-07-12    95.0                    100.0           20.0   

Key                                    2months_prior  3months_later  \
salesman training_date training_month                                 
John     2020-11-30    300.0                   150.0            NaN   
Ruddy    2020-07-12    95.0                    300.0            0.0   

Key                                    3months_prior  4months_later  \
salesman training_date training_month                                 
John     2020-11-30    300.0                    30.0            NaN   
Ruddy    2020-07-12    95.0                    225.0           20.0   

Key                                    4months_prior  5months_later  \
salesman training_date training_month                                 
John     2020-11-30    300.0                    80.0            NaN   
Ruddy    2020-07-12    95.0                     30.0          100.0   

Key                                    5months_prior  6months_prior  \
salesman training_date training_month                                 
John     2020-11-30    300.0                    28.0            0.0   
Ruddy    2020-07-12    95.0                     50.0           90.0   

Key                                    7months_prior  8months_prior  \
salesman training_date training_month                                 
John     2020-11-30    300.0                   250.0          200.0   
Ruddy    2020-07-12    95.0                      NaN            NaN   

Key                                    9months_prior  
salesman training_date training_month                 
John     2020-11-30    300.0                    20.0  
Ruddy    2020-07-12    95.0                      NaN 

首先使用to_datetime和astype(我将其标记为'anchor_date')将training_date转换为当月的第一个。然后,我们设置索引并将列转换为datetime数据类型和堆栈,为我们在下一步中计算时间差提供了一种简单的方法

import pandas as pd
import numpy as np

# Make datetime and then turn value into first of the month
df['training_date'] = pd.to_datetime(df['training_date'])
df['date_anchor'] = df.training_date.astype('datetime64[M]')

df = df.set_index(['salesman', 'training_date', 'date_anchor'])
df.columns = pd.Index(pd.to_datetime(df.columns, format='%m/%y'), name='date')
df = df.stack().reset_index()

#   salesman training_date date_anchor       date    0
#0      John    2020-11-30  2020-11-01 2020-01-01  100
#1      John    2020-11-30  2020-11-01 2020-02-01   20
#2      John    2020-11-30  2020-11-01 2020-03-01  200
#3      John    2020-11-30  2020-11-01 2020-04-01  250
#...
#19    Ruddy    2020-07-12  2020-07-01 2020-08-01   10
#20    Ruddy    2020-07-12  2020-07-01 2020-09-01   20
#21    Ruddy    2020-07-12  2020-07-01 2020-10-01    0
#22    Ruddy    2020-07-12  2020-07-01 2020-11-01   20
#23    Ruddy    2020-07-12  2020-07-01 2020-12-01  100   

现在我们需要计算两者之间的整数月数,这可以通过一些数学来实现,并在将来使用np.select到NaN个月并设置标签。最后,以数据帧为轴心

df['months'] = ((df.date.dt.year - df.date_anchor.dt.year) * 12 
                 + (df.date.dt.month - df.date_anchor.dt.month))

df['months'] = np.select([df.months.eq(0), df.months.lt(0)],
                         ['training_month', df.months.abs().astype(str) + 'm_prior'],
                         df.months.abs().astype(str) + 'm_post')

df = (df.pivot_table(index=['salesman', 'training_date'], columns='months', values=0)
        .rename_axis(columns=None)
        .reset_index())

  salesman training_date  10m_prior  1m_post  1m_prior  2m_post  2m_prior  3m_post  3m_prior  4m_post  4m_prior  5m_post  5m_prior  6m_prior  7m_prior  8m_prior  9m_prior  training_month
0     John    2020-11-30      100.0    250.0     100.0      NaN     150.0      NaN      30.0      NaN      80.0      NaN      28.0       0.0     250.0     200.0      20.0           300.0
1    Ruddy    2020-07-12        NaN     10.0     100.0     20.0     300.0      0.0     225.0     20.0      30.0    100.0      50.0      90.0       NaN       NaN       NaN            95.0

相关问题 更多 >