仅在开始日期和结束日期之间按标识符对两列求和

id enddate startdate ownerId value 1 2019-10-05 2019-10-05 10 105 2 2019-10-06 2019-10-05 10 240 3 2019-10-07 2019-10-05 10 420 4 2019-10-08 2019-10-08 10 470 5 2019-10-01 2019-10-01 11 320 6 2019-10-02 2019-10-01 11 18 7 2019-10-10 2019-10-10 12 50 8 2019-10-12 2019-10-10 12 412 9 2019-10-14 2019-10-10 12 398 10 2019-10-15 2019-10-12 12 320

id enddate startdate ownerId value output 1 2019-10-05 2019-10-05 10 105 105 # Nothing between 2019-10-05 and 2019-10-05 2 2019-10-06 2019-10-05 10 240 345 # Found 1 record (with id 1) 3 2019-10-07 2019-10-05 10 420 765 # Found 2 records (with id 1 and 2) 4 2019-10-08 2019-10-08 10 470 470 # Nothing else between 2019-10-08 and 2019-10-08 5 2019-10-01 2019-10-01 11 320 320 # Reset because Owner is different 6 2019-10-02 2019-10-01 11 18 338 # Found 1 record (with id 5) 7 2019-10-10 2019-10-10 12 50 50 # ... 8 2019-10-12 2019-10-10 12 412 462 9 2019-10-14 2019-10-10 12 398 860 10 2019-10-15 2019-10-12 12 320 1130 # Found 3 records between 2019-10-12 and 2019-10-15 (with id 8, 9 and 10)

2条回答

网友

1楼 · 编辑于 2024-09-28 01:31:14

您可以在单个指令中执行：

df['output'] = df.apply(lambda row:
    df[df.ownerId.eq(row.ownerId) & df.enddate.between(row.startdate, row.enddate)]
    .value.sum(), axis=1)

网友
2楼 · 编辑于 2024-09-28 01:31:14

如果数据集不太大，可以使用selfjoin：
df[['startdate','enddate']] = df[['startdate','enddate']].apply(pd.to_datetime) df['output'] = (df.merge(df, on='ownerId', suffixes=('','_y')) .query('startdate <= enddate_y <= enddate') .groupby('id')['value_y'] .sum() .to_numpy()) print(df)
输出：
id enddate startdate ownerId value output 0 1 2019-10-05 2019-10-05 10 105 105 1 2 2019-10-06 2019-10-05 10 240 345 2 3 2019-10-07 2019-10-05 10 420 765 3 4 2019-10-08 2019-10-08 10 470 470 4 5 2019-10-01 2019-10-01 11 320 320 5 6 2019-10-02 2019-10-01 11 18 338 6 7 2019-10-10 2019-10-10 12 50 50 7 8 2019-10-12 2019-10-10 12 412 462 8 9 2019-10-14 2019-10-10 12 398 860 9 10 2019-10-15 2019-10-12 12 320 1130

相关问题更多 >

编程相关推荐

热门问题

热门文章