pandas groupby仅聚合分组的两个连续字段之间的公用行

df = pd.DataFrame({ 'Date': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4], 'ID': [ 1, 2, 3, 4 , 2, 3, 4 , 2, 3, 4, 5, 1, 2, 3, 4], 'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] })

tmpL = df.groupby('Date')['ID'].apply(list) tmpV = df.groupby('Date')['Value'].sum() for i in range(1, tmpL.shape[0]): res = list(set(tmpL.iloc[i]) - set(tmpL.iloc[i - 1])) v = df.loc[ df.ID.isin(res) & (df.Date == tmpL.index[i]), 'Value'].sum() tmpV.iloc[i] = tmpV.iloc[i] - v tmpV Date 1 10 2 18 3 27 4 42 Name: Value, dtype: int64

2条回答

网友

1楼 · 编辑于 2024-09-29 23:29:01

将^{}与聚合sum一起使用，与^{}进行比较，最后与sum一起传递给^{}：

df1 = df.pivot_table(index='Date', columns='ID', values='Value', aggfunc='sum')
s = df1.mask(df1.notna().diff().fillna(False)).sum(axis=1)
print (s)
Date
1    10.0
2    18.0
3    27.0
4    42.0
dtype: float64

第一个解决方案，我认为斯洛威尔：

通过将原始数据转换为sets，然后使用^{}、^{}并通过^{}、最后一次聚合sum获得原始数据的所有匹配值，然后减去：

tmpL = (df.groupby('Date')['ID'].apply(set)
          .diff()
          .explode()
          .reset_index()
          .merge(df)
          .groupby('Date')['Value']
          .sum())
tmpV = df.groupby('Date')['Value'].sum()

out = tmpV.sub(tmpL, fill_value=0)
print (out)
Date
1    10.0
2    18.0
3    27.0
4    42.0

网友

2楼 · 编辑于 2024-09-29 23:29:01

尝试：

df = df.pivot_table(index='Date', columns='ID', values='Value')#.reset_index()
condition = df.notna() & df.notna().shift(1)
condition.iloc[0,:]=True
print(df[condition].sum(axis=1))

输出：

相关问题更多 >

编程相关推荐

热门问题

热门文章