Pandas按groupby求和，但不包括某些列

Code Country Item_Code Item Ele_Code Unit Y1961 Y1962 Y1963 2 Afghanistan 15 Wheat 5312 Ha 10 20 30 2 Afghanistan 25 Maize 5312 Ha 10 20 30 4 Angola 15 Wheat 7312 Ha 30 40 50 4 Angola 25 Maize 7312 Ha 30 40 50

Code Country Item_Code Item Ele_Code Unit Y1961 Y1962 Y1963 2 Afghanistan 15 C3 5312 Ha 20 40 60 4 Angola 25 C4 7312 Ha 60 80 100

3条回答

网友

1楼 · 编辑于 2024-07-03 08:10:11

如果您正在寻找一种更通用的方法来应用于许多列，那么您可以做的是构建列名列表并将其作为分组数据帧的索引传递。在您的情况下，例如：

columns = ['Y'+str(i) for year in range(1967, 2011)]

df.groupby('Country')[columns].agg('sum')

网友

2楼 · 编辑于 2024-07-03 08:10:11

您可以选择groupby的列：

In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum()
Out[11]:
                       Y1961  Y1962  Y1963
Country     Item_Code
Afghanistan 15            10     20     30
            25            10     20     30
Angola      15            30     40     50
            25            30     40     50

请注意，传递的列表必须是列的子集，否则您将看到KeyError。

网友

3楼 · 编辑于 2024-07-03 08:10:11

agg函数将为您执行此操作。将列和函数作为带列的dict传递，输出：

df.groupby(['Country', 'Item_Code']).agg({'Y1961': np.sum, 'Y1962': [np.sum, np.mean]})  # Added example for two output columns from a single input column

这将仅显示“按列分组”和指定的聚合列。在本例中，我包含了两个应用于“Y1962”的agg函数。

要获得您希望看到的内容，请将“分组依据”中的其他列包括在内，并对框架中的Y变量应用求和：

df.groupby(['Code', 'Country', 'Item_Code', 'Item', 'Ele_Code', 'Unit']).agg({'Y1961': np.sum, 'Y1962': np.sum, 'Y1963': np.sum})

相关问题更多 >

编程相关推荐

热门问题

热门文章