按多个列分组并在for循环中返回一列的和

#Get the first Start Date minStartDate = df.loc[ df['ID'] == 56886, 'Start Date'].min() #Get the last End Date maxEndDate = df.loc[ df['ID'] == 56886, 'End Date'].max() #Get the value sum sumValue = df.loc[ df['ID'] == 56886, 'Value'].sum()

minStartDate = {} maxEndDate = {} summyValue = {} Key = {} ID = df[' ID'] for i in ID: Key[i] = df.loc[ df['ID'] == i, 'ID'] #Get the first Start Date minStartDate[i] = df.loc[ df['ID'] == i, 'Start Date'].min() #Get the last End Date maxEndDate[i] = df.loc[ df['ID'] == i, 'End Date'].max() #Get the Value sum summyValue[i] = df.loc[ df['ID'] == i, 'Value'].sum() print(summyValue,minStartDate,maxEndDate)

1条回答

网友
1楼 · 发布于 2024-09-29 23:15:01

如果你想用最大重复值填充random 1和random 2，你可以用你自己的函数，例如
df = pd.DataFrame({ 'id': [1,1,1,1,2,2,2], 'r1': ['x','y','y','y','x','x','x'], 'r2': ['t','I','t','t','c','c','c'] }) def max_rep(x): return x.value_counts().idxmax() ndf = df.groupby('id',as_index=False).agg({'r1': max_rep,'r2':max_rep})
或者用lambda，如果你想在一条线上
^{pr2}$
输出：
id r1 r2 0 1 y t 1 2 x c
正如Jon所说，你可以使用agg在一行中完成所有步骤，即
df.groupby('ID',as_index=False).agg('Start Date': 'min', 'End Date': 'max', 'Value': 'sum', \ 'Random 1':max_rep,'Random 2':max_rep})
如果您希望与random1和random2一起分组，则可以使用
df.groupby(['ID','Random 1','Random 2'],as_index=False).agg('Start Date': 'min', 'End Date': 'max', 'Value': 'sum')

相关问题更多 >

编程相关推荐

热门问题

热门文章