基于表中条件的值之和

2024-06-01 11:02:34 发布

您现在位置:Python中文网/ 问答频道 /正文

    ID        change         SX      Supresult
0  UNITY        NaN           0        NaN
1  UNITY    -0.009434        100     -0.015283 (P1)
2  UNITY     0.003463         0        NaN
3  TRINITY   0.008628        100     -0.043363 
4  TRINITY  -0.027374        100      0.008423 (P2)
5  TRINITY  -0.011002         0        NaN
6  TRINITY  -0.004987        100       NaN
7  TRINITY   0.007566         0        NaN

如果'SX'等于100,我使用下面的程序创建一个新列'supersult'。新列存储三个“更改”值的总和。例如,在索引1中,supersult是索引2、3和;4.

df['Supresult'] = df[df.SX == 100].index.to_series().apply(lambda x: df.change.shift(-1).iloc[x: x + 3].sum())

然而,我面临两个需要帮助的问题:

(P1):我希望总和是特定于“ID”的。例如,索引1中的结果继续前进,从UNITY中取一个值,从TRINITY中取两个值的和。只要金额在同一个“ID”内,就应进行合计。我试图在代码末尾添加.groupby('ID'),但它给出了一个键错误

(P2):由于指数3已经给出了未来三天的变化总和,指数4不应该继续计算未来三天的总和。只有在上一个计算期(即指数6及以后)完成后,才能取下一个总和

预期结果:

    ID        change         SX      Supresult
0  UNITY        NaN           0        NaN
1  UNITY    -0.009434        100       NaN
2  UNITY     0.003463         0        NaN
3  TRINITY   0.008628        100     -0.043363 
4  TRINITY  -0.027374        100       NaN
5  TRINITY  -0.011002         0        NaN
6  TRINITY  -0.004987        100       NaN
7  TRINITY   0.007566         0        NaN

我们将非常感谢您的帮助,谢谢


Tags: 程序iddfindexunitynan指数change
1条回答
网友
1楼 · 发布于 2024-06-01 11:02:34

考虑到您的复杂需求,我认为循环是合适的:

# If your data frame is not indexed sequentially, this will make it so.
# The algorithm needs the frame to be indexed 0, 1, 2, ...
df.reset_index(inplace=True)

# Every row starts off in "unconsumed" state
consumed = np.repeat(0, len(df))
result = np.repeat(np.nan, len(df))

for i, sx in df['SX'].iteritems():
    # The next three rows
    slc = slice(i+1, i+4)

    # A row is considered a match if:
    #   * It has SX == 100
    #   * The next three rows have the same ID
    #   * The next three rows are not involved in a previous summation
    match = (
        (sx == 100) and
        (df.loc[slc, 'ID'].nunique() == 1) and
        (consumed[i] == 0)
    )
    if match:
        consumed[slc] = 1
        result[i] = df.loc[slc, 'Supresult'].sum()

df['Supresult'] = result

相关问题 更多 >