在数据帧上使用滚动而不使用apply时遇到问题，这很慢

ID Date Prize IfWon 1 01-01-20 5 1 2 01-01-20 8 1 1 01-03-20 3 0 1 01-04-20 10 1 1 01-07-20 5 0 2 01-10-20 5 1 3 01-10-20 10 1

ID Date Prize IfWon PrevWon 1 01-01-20 5 1 0 2 01-01-20 8 1 0 1 01-03-20 3 0 5 1 01-04-20 10 1 5 1 01-07-20 5 0 15 2 01-10-20 5 1 0 3 01-10-20 10 1 0

def get_rolling_prize_sum(grp, freq): return grp.rolling(freq, on = 'Date', closed = 'right')['CurrentWon'].sum() processed_data_df['CurrentWon'] = processed_data_df['Prize'] * processed_data_df['IfWon'] # gets deleted later processed_data_df['PrevWon'] = processed_data_df.groupby('ID', group_keys=False).apply(get_rolling_prize_sum, '7D').astype(float) - processed_data_df['CurrentWon']

# Not using closed right here, just subtracting processed_data_df['PrevWon'] = processed_data_df.groupby('ID', group_keys=False).rolling('7D', on = 'Date')['CurrentWon'].sum() - processed_data_df['CurrentWon'] ValueError: cannot join with no overlapping index names

1条回答

网友

1楼 · 发布于 2024-09-30 18:28:45

改进了以前的答案，并成功解决了groupby的排序问题

df = pd.read_csv("data.csv")
df["Date"] = pd.to_datetime(df['Date'], format='%m-%d-%y')
df["CurrentWon"] = df["Prize"] * df["IfWon"]

result = df.groupby("ID").rolling("7D", on = 'Date', closed = 'right').CurrentWon.sum().reset_index()
result.rename(columns={"CurrentWon": "PreviousWon"}, inplace=True)
df = df.merge(result, on=["ID", "Date"])
df["PreviousWon"] -= df["CurrentWon"]
print(df)

产出：

   ID       Date  Prize  IfWon  CurrentWon  PreviousWon
0   1 2020-01-01      5      1           5          0.0
1   2 2020-01-01      8      1           8          0.0
2   1 2020-01-03      3      0           0          5.0
3   1 2020-01-04     10      1          10          5.0
4   1 2020-01-07      5      0           0         15.0
5   2 2020-01-10      5      1           5          0.0
6   3 2020-01-10     10      1          10          0.0

相关问题更多 >

编程相关推荐

热门问题

热门文章