带条件数据框的过去月份的累计总和

2024-10-02 22:30:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我只展示了两个例子来说明我的问题,但是realdf有更多的行。我想为每个id创建一个新列,该列计算上次加薪后的月份数。如果id有两次加薪,则上次加薪后的月份数应在第二次加薪时再次重置。如果id只有一次提升,则过去一个月与“上次提升”的差值应与每个用户的df中存在的最早月份相对应(每个用户的第一行)

如果我有这样一个数据帧:

id   amount         date.  diff_amount.  raise    
 1      200   01/02/2018           NaN       0                        
 1      200.  01/03/2018             0       0
 1      200   01/04/2018             0       0
 1      250.  01/05/2018            50       1
 1      250   01/06/2018             0       0
 1      250   01/07/2018             0       0
 1      250   01/09/2018             0       0
 1      250   01/11/2018             0       0
 1      350   01/12/2018           100       1
 1      250   01/01/2019             0       0
 2      500   01/06/2018           NaN       0
 2      500   01/07/2018             0       0
 2      500   01/06/2018             0       0
 2      550   01/07/2018            50       1

预期结果:

id   amount         date.  diff_amount.  raise     month_from_last_raise 
 1      200   01/02/2018           NaN       0                         0                        
 1      200.  01/03/2018             0       0                         1
 1      200   01/04/2018             0       0                         2
 1      250.  01/05/2018            50       1                         3
 1      250   01/06/2018             0       0                         1
 1      250   01/07/2018             0       0                         2
 1      250   01/09/2018             0       0                         3
 1      250   01/11/2018             0       0                         4
 1      350   01/12/2018           100       1                         5
 1      250   01/01/2019             0       0                         1
 2      500   01/06/2018           NaN       0                         0
 2      500   01/07/2018             0       0                         1
 2      500   01/06/2018             0       0                         2
 2      550   01/07/2018            50       1                         3

Tags: 数据用户iddfdatediffnanamount
1条回答
网友
1楼 · 发布于 2024-10-02 22:30:50

我不确定我是否完全理解这个问题,但如果我理解正确,我找到了一个解决方案,跟踪累积值,然后根据id和累积值进行分组。输入df应命名为“df”,输出df应命名为“final_df”:

# Make a column that cumulatively calculates the sums:
df['cumulative'] = df['raise'].cumsum()

# Group by id
id_group = df.groupby('id')

counter = 0
for each_group in id_group:
    # Group by the cumulative values
    cumulative_group = each_group[1].groupby('cumulative')

    for each_cumulative in cumulative_group:
        # Iterate through each cumulative values
        temp_df = each_cumulative[1]
        # Count the number of months since the last change:
        temp_df['month_from_last_raise'] = list(range(temp_df.shape[0]))
        if counter == 0:
            final_df = temp_df
            counter += 1
        else:
            final_df = final_df.append(temp_df)

final_df.drop('cumulative', axis=1, inplace=True)

相关问题 更多 >