仅获取月末日期前一个月的值,否则填写pandas/python

2024-05-19 03:19:55 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个如下所示的数据帧

effective_date,ent_id,value,domain,month_end_date,max_avl_date
    2021-03-30,101,61,G,2021-03-30,Y
    2021-03-30,103,64,G,2021-03-30,Y
    2021-03-30,101,21,R,2021-03-30,Y
    2021-03-30,103,46,R,2021-03-30,Y
    2021-03-29,101,32,G,2021-03-30,N
    2021-03-29,103,39,G,2021-03-30,N
    2021-03-29,101,21,R,2021-03-30,N
    2021-03-29,103,32,R,2021-03-30,N
    2021-03-15,101,54,G,2021-03-30,N
    2021-03-15,103,72,G,2021-03-30,N
    2021-03-15,101,24,R,2021-03-30,N
    2021-03-15,103,22,R,2021-03-30,N
    2021-03-14,101,61,G,2021-03-30,N
    2021-03-14,103,64,G,2021-03-30,N
    2021-03-14,101,31,R,2021-03-30,N
    2021-03-14,103,34,R,2021-03-30,N
    2021-03-13,101,32,G,2021-03-30,N
    2021-03-13,103,39,G,2021-03-30,N
    2021-03-13,101,33,R,2021-03-30,N
    2021-03-13,103,33,R,2021-03-30,N
    2021-03-05,101,82,G,2021-03-30,N
    2021-03-05,103,15,G,2021-03-30,N
    2021-03-05,101,27,R,2021-03-30,N
    2021-03-05,103,19,R,2021-03-30,N
    2021-03-04,101,41,G,2021-03-30,N
    2021-03-04,103,28,G,2021-03-30,N
    2021-03-04,101,94,R,2021-03-30,N
    2021-03-04,103,88,R,2021-03-30,N
    2021-03-03,101,98,G,2021-03-30,N
    2021-03-03,103,72,G,2021-03-30,N
    2021-03-03,101,87,R,2021-03-30,N
    2021-03-03,103,43,R,2021-03-30,N
    2021-03-02,101,53,G,2021-03-30,N
    2021-03-02,103,12,G,2021-03-30,N
    2021-03-02,101,35,R,2021-03-30,N
    2021-03-02,103,42,R,2021-03-30,N
    2021-03-01,101,71,G,2021-03-30,N
    2021-03-01,103,72,G,2021-03-30,N
    2021-03-01,101,31,R,2021-03-30,N
    2021-03-01,103,32,R,2021-03-30,N
    2021-02-28,101,51,G,2021-02-28,Y
    2021-02-28,103,27,G,2021-02-28,Y
    2021-02-28,101,51,R,2021-02-28,Y
    2021-02-28,103,57,R,2021-02-28,Y
    2021-02-27,101,54,G,2021-02-28,N
    2021-02-27,103,72,G,2021-02-28,N
    2021-02-27,101,74,R,2021-02-28,N
    2021-02-27,103,77,R,2021-02-28,N

我想为下一个月的结束日期获取上一个月的结束值,对于当天的剩余时间,它被填充为NaN

输出结果如下:

    effective_date,ent_id,value,domain,month_end_date,max_avl_date,1m_prev_value
2021-03-30,101,61,G,2021-03-30,Y,51
2021-03-30,103,64,G,2021-03-30,Y,27
2021-03-30,101,21,R,2021-03-30,Y,51
2021-03-30,103,46,R,2021-03-30,Y,57
2021-03-29,101,32,G,2021-03-30,N,
2021-03-29,103,39,G,2021-03-30,N,
2021-03-29,101,21,R,2021-03-30,N,
2021-03-29,103,32,R,2021-03-30,N,
2021-03-15,101,54,G,2021-03-30,N,
2021-03-15,103,72,G,2021-03-30,N,
2021-03-15,101,24,R,2021-03-30,N,
2021-03-15,103,22,R,2021-03-30,N,
2021-03-14,101,61,G,2021-03-30,N,
2021-03-14,103,64,G,2021-03-30,N,
2021-03-14,101,31,R,2021-03-30,N,
2021-03-14,103,34,R,2021-03-30,N,
2021-03-13,101,32,G,2021-03-30,N,
2021-03-13,103,39,G,2021-03-30,N,
2021-03-13,101,33,R,2021-03-30,N,
2021-03-13,103,33,R,2021-03-30,N,
2021-03-05,101,82,G,2021-03-30,N,
2021-03-05,103,15,G,2021-03-30,N,
2021-03-05,101,27,R,2021-03-30,N,
2021-03-05,103,19,R,2021-03-30,N,
2021-03-04,101,41,G,2021-03-30,N,
2021-03-04,103,28,G,2021-03-30,N,
2021-03-04,101,94,R,2021-03-30,N,
2021-03-04,103,88,R,2021-03-30,N,
2021-03-03,101,98,G,2021-03-30,N,
2021-03-03,103,72,G,2021-03-30,N,
2021-03-03,101,87,R,2021-03-30,N,
2021-03-03,103,43,R,2021-03-30,N,
2021-03-02,101,53,G,2021-03-30,N,
2021-03-02,103,12,G,2021-03-30,N,
2021-03-02,101,35,R,2021-03-30,N,
2021-03-02,103,42,R,2021-03-30,N,
2021-03-01,101,71,G,2021-03-30,N,
2021-03-01,103,72,G,2021-03-30,N,
2021-03-01,101,31,R,2021-03-30,N,
2021-03-01,103,32,R,2021-03-30,N,
2021-02-28,101,51,G,2021-02-28,Y,
2021-02-28,103,27,G,2021-02-28,Y,
2021-02-28,101,51,R,2021-02-28,Y,
2021-02-28,103,57,R,2021-02-28,Y,
2021-02-27,101,54,G,2021-02-28,N,
2021-02-27,103,72,G,2021-02-28,N,
2021-02-27,101,74,R,2021-02-28,N,
2021-02-27,103,77,R,2021-02-28,N,

我知道如何在sql中实现这一点,因此下面是sql中相同逻辑的副本

DECODE(max_avl_date,Y,LEAD(VALUE,1) OVER (PARTITION BY ent_id,domain order by max_avl_date desc,effective_date desc),null)

我在padas中尝试了同样的逻辑,但没有成功

我的python/pandas工作:

df['1m_prev_value']= np.where((df['max_avl_date']=='Y',(df.sort_values(by=['domain','ent_id','max_avl_date','effective_date'],ascending=False)).groupby(['domain','ent_id']).shift(-1)),np.nan)

Tags: iddfsqldatebyvaluedomain逻辑
1条回答
网友
1楼 · 发布于 2024-05-19 03:19:55

嗨,通过这个我已经达到了我的要求

df_ss.loc[df_ss['max_avl_date']=='Y','1m_prev_value']=((df_ss[df_ss['max_avl_date']=='Y']).sort_values(by=['domain','ent_id','max_avl_date','effective_date'],ascending=True).groupby(["domain","max_avl_date",'ent_id'])['value'].shift(1))

相关问题 更多 >

    热门问题