如何使用np.where()/np.select()或pandas/python中更好的方法执行计算?

2024-10-06 13:31:01 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个数据帧:

date,ent_id,val,month_end,FLAG1,FLAG2,FLAG3,FLAG4
2021-03-23,101,61,2021-03-31,R,R,R,R
2021-03-12,103,64,2021-03-31,G,R,R,R
2021-03-15,101,32,2021-03-31,F,R,F,R
2021-04-01,103,39,2021-04-30,R,F,G,R
2021-04-02,101,71,2021-04-30,R,F,R,G
2021-04-02,103,79,2021-04-30,F,F,R,F
2021-04-30,101,51,2021-04-30,G,F,F,G
2021-04-30,103,53,2021-04-30,G,G,F,F
2021-05-31,101,28,2021-05-31,R,R,G,F
2021-05-31,103,26,2021-05-31,F,F,R,R
2021-05-31,103,61,2021-05-31,R,F,F,G
2021-05-31,101,47,2021-05-31,F,F,F,R
2021-06-06,101,45,2021-06-30,G,G,F,R
2021-06-06,103,78,2021-06-30,G,G,G,G
2021-06-07,101,23,2021-06-30,R,G,R,R
2021-06-07,103,31,2021-06-30,R,R,F,F
2021-07-31,101,14,2021-07-30,F,G,F,G
2021-07-31,103,25,2021-07-30,F,R,R,R
2021-07-31,101,82,2021-07-30,F,F,G,F
2021-07-31,103,15,2021-07-30,F,G,F,G

我想在以下(sql)条件的基础上创建一个新列,即cmpt

CASE WHEN (date =month_end) THEN (DECODE(FLAG1,R,10,G,20,F,-5)+DECODE(FLAG2,R,10,G,20,F,-5)+DECODE(FLAG3,R,10,G,20,F,-5))
ELSE (DECODE(FLAG2,R,10,G,20,F,-5)+DECODE(FLAG3,R,10,G,20,F,-5)+DECODE(FLAG4,R,10,G,20,F,-5))
END 

因此,输出将如下所示:

date,ent_id,val,month_end,FLAG1,FLAG2,FLAG3,FLAG4,CMPT
2021-03-23,101,61,2021-03-31,R,R,R,R,30
2021-03-12,103,64,2021-03-31,G,R,R,R,30
2021-03-15,101,32,2021-03-31,F,R,F,R,15
2021-04-01,103,39,2021-04-30,R,F,G,R,25
2021-04-02,101,71,2021-04-30,R,F,R,G,25
2021-04-02,103,79,2021-04-30,F,F,R,F,0
2021-04-30,101,51,2021-04-30,G,F,F,G,10
2021-04-30,103,53,2021-04-30,G,G,F,F,35
2021-05-31,101,28,2021-05-31,R,R,G,F,40
2021-05-31,103,26,2021-05-31,F,F,R,R,0
2021-05-31,103,61,2021-05-31,R,F,F,G,0
2021-05-31,101,47,2021-05-31,F,F,F,R,-15
2021-06-06,101,45,2021-06-30,G,G,F,R,25
2021-06-06,103,78,2021-06-30,G,G,G,G,60
2021-06-07,101,23,2021-06-30,R,G,R,R,40
2021-06-07,103,31,2021-06-30,R,R,F,F,0
2021-07-31,101,14,2021-07-30,F,G,F,G,35
2021-07-31,103,25,2021-07-30,F,R,R,R,30
2021-07-31,101,82,2021-07-30,F,F,G,F,10
2021-07-31,103,15,2021-07-30,F,G,F,G,15

我已经用np.where和np.select尝试过了,但是没有成功


Tags: 数据idsqldatenpval条件end
1条回答
网友
1楼 · 发布于 2024-10-06 13:31:01

我首先将FLAG列代码映射到它们各自的值。然后,使用np.where获得适当列的总和,其中df.date == df.month_enddf.date != df.month_end

flagmap = df.filter(like="FLAG").replace({'R': 10, 'G': 20, 'F': -5})

df['CMPT'] = np.where(
    df.date == df.month_end,
    flagmap[['FLAG1', 'FLAG2', 'FLAG3']].sum(axis=1),
    flagmap[['FLAG2', 'FLAG3', 'FLAG4']].sum(axis=1)
)

          date  ent_id  val   month_end FLAG1 FLAG2 FLAG3 FLAG4  CMPT
0   2021-03-23     101   61  2021-03-31     R     R     R     R    30
1   2021-03-12     103   64  2021-03-31     G     R     R     R    30
2   2021-03-15     101   32  2021-03-31     F     R     F     R    15
3   2021-04-01     103   39  2021-04-30     R     F     G     R    25
4   2021-04-02     101   71  2021-04-30     R     F     R     G    25
5   2021-04-02     103   79  2021-04-30     F     F     R     F     0
6   2021-04-30     101   51  2021-04-30     G     F     F     G    10
7   2021-04-30     103   53  2021-04-30     G     G     F     F    35
8   2021-05-31     101   28  2021-05-31     R     R     G     F    40
9   2021-05-31     103   26  2021-05-31     F     F     R     R     0
10  2021-05-31     103   61  2021-05-31     R     F     F     G     0
11  2021-05-31     101   47  2021-05-31     F     F     F     R   -15
12  2021-06-06     101   45  2021-06-30     G     G     F     R    25
13  2021-06-06     103   78  2021-06-30     G     G     G     G    60
14  2021-06-07     101   23  2021-06-30     R     G     R     R    40
15  2021-06-07     103   31  2021-06-30     R     R     F     F     0
16  2021-07-31     101   14  2021-07-30     F     G     F     G    35
17  2021-07-31     103   25  2021-07-30     F     R     R     R    30
18  2021-07-31     101   82  2021-07-30     F     F     G     F    10
19  2021-07-31     103   15  2021-07-30     F     G     F     G    35

或者,如果没有numpy,只需使用loc查找df.date == df.month_end所在的行,再次查找df.date != df.month_end所在的行,并对相应情况下的正确列求和:

flagmap = df.filter(like="FLAG").replace({'R': 10, 'G': 20, 'F': -5})
df.loc[df.date == df.month_end, 'CMPT'] = flagmap[['FLAG1', 'FLAG2', 'FLAG3']].sum(axis=1)
df.loc[df.date != df.month_end, 'CMPT'] = flagmap[['FLAG2', 'FLAG3', 'FLAG4']].sum(axis=1)

相关问题 更多 >