仅当以前的ID介于2个值之间时，才显示累积和列

id endId startId ownerId value 1 50 50 10 105 2 51 50 10 240 3 52 50 10 420 4 53 53 10 470 5 40 40 11 320 6 41 40 11 18 7 55 55 12 50 8 57 55 12 412 9 59 55 12 398 10 60 57 12 320

id endId startId ownerId value output 1 50 50 10 105 105 # Nothing between 50 and 50 2 51 50 10 240 345 # Found 1 record (endId with id 1) 3 52 50 10 420 765 # Found 2 records (endId with id 1 and 2) 4 53 53 10 470 470 # Nothing else between 53 and 53 5 40 40 11 320 320 # Reset because Owner is different 6 41 40 11 18 338 # Found 1 record (endId with id 5) 7 55 55 12 50 50 # ... 8 57 55 12 412 462 9 59 55 12 398 860 10 60 57 12 320 1130 # Found 3 records between 57 and 60 (endId with id 8, 9 and 10)

2条回答

网友

1楼 · 编辑于 2024-09-28 01:24:00

我将df复制到df2，以保留原始数据。我建议你把任务分成两步：

#change everything
df2['output'] =  df.groupby('ownerId')['value'].cumsum()


#check and update if it applies
df2['output'] = np.where((df2['endId']<= df['startId']),                          
                           df2['value'],     #copy value from
                           df2['output'])    #place value into 

print(df2)
id  endId  startId  ownerId  value  output
0   1     50       50       10    105     105
1   2     51       50       10    240     345
2   3     52       50       10    420     765
3   4     53       53       10    470     470
4   5     40       40       11    320     320
5   6     41       40       11     18     338
6   7     55       55       12     50      50
7   8     57       55       12    412     462
8   9     59       55       12    398     860
9  10     60       57       12    320    1180

打印逻辑：

我很抱歉，但我还是不明白。对于ownerId 10和11，endId和startId共享相同值的记录将在累计和上计数。看起来还可以。但出于某种原因，你说同样的规则不适用于OwnerID12。我知道应该考虑7到10的身份证。模式似乎是当endId和startId 匹配最高值，它发生在ID4上。你知道吗

网友

2楼 · 编辑于 2024-09-28 01:24:00

我将使用numpy广播来标识您要查找的行：

# Create new df with ownerId as index
df2=df.set_index('ownerId')
df2['output']=0

# Loop over the various ownerIds
for k in df2.index:
    refend=df2.loc[k,'endId'].values
    refstart=df2.loc[k,'startId'].values

    # Identify values matching the condition
    i,j=np.where((refend[:,None]<=refend)&(refend[:,None]>=refstart))
    # Groupby and sum
    dfres=pd.concat([df2.loc[k].iloc[j].endId.reset_index(drop=True),
                     df2.loc[k].iloc[i].value.reset_index(drop=True)],
                    axis=1).groupby('endId').sum()
    df2.loc[k,'output']=dfres.value.values

# reset index
df2.reset_index(inplace=True)

输出为：

   ownerId  id  endId  startId  value  output
0       10   1     50       50    105     105
1       10   2     51       50    240     345
2       10   3     52       50    420     765
3       10   4     53       53    470     470
4       11   5     40       40    320     320
5       11   6     41       40     18     338
6       12   7     55       55     50      50
7       12   8     57       55    412     462
8       12   9     59       55    398     860
9       12  10     60       57    320    1130

编辑

您可以通过以下方法避免for循环：

refend=df.loc[:,'endId'].values
refstart=df.loc[:,'startId'].values

i,j=np.where((refend[:,None]<=refend)&(refend[:,None]>=refstart))

dfres=pd.concat([df.iloc[j].endId.reset_index(drop=True),
                     df.loc[:,['ownerId','value']].iloc[i].reset_index(drop=True)],
                    axis=1).groupby(['ownerId','endId']).sum()

df['output']=dfres.value.values

相关问题更多 >

编程相关推荐

热门问题

热门文章