基于列名是否在DataFram中的多个groupby函数

2024-09-30 10:38:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧

其中一篇总结了伦敦地区的人口普查数据。有两种类型的列,一种是可以求和的,因为它们是绝对数;另一种是需要求平均值的,因为它们是百分比

我想按行政区对人口普查数据进行分组。我在另一个数据框架中有一个列的列表,这些列具有百分比,并且在按分组时应求平均值,其他列应求和

到目前为止,我所拥有的:

test = censusDF.groupby(['Borough'], as_index = False).agg({pc_cols_df:'mean',
                                                        i for i not in pc_cols_df : 'sum'
                                                       })
test

这给了我这个错误

  File "<ipython-input-84-6a20dc571632>", line 2
  for i not in pc_cols_df : 'sum'
  ^
  SyntaxError: invalid syntax

我也试过:

test = censusDF.groupby(['Borough'], as_index = False).agg({pc_cols_df.values.tolist():'mean'})
test

得到这个错误

TypeError: unhashable type: 'list'

应求平均值的列名称示例:

age=All ages: Population % by age
age=0 to 4: Population % by age
age=5 to 7: Population % by age
age=8 to 9: Population % by age
age=10 to 14: Population % by age
age=15: Population % by age

普查数据框架样本:

id, Name,   Borough N of all usual residents,   distance to work=Work mainly at or from home: Population N by distance travelled to work,   distance to work=Other: Population N by distance travelled to work, Total distance to work (km),    Average distance to work (km),  age=All ages: Population % by age,  age=0 to 4: Population % by age,    age=5 to 7: Population % by age,    age=8 to 9: Population % by age,    age=10 to 14: Population % by age

E05000039,  Thames, BarkingDagenham,    10728,  315,    569,    44684.2,    13.8,   100,    12.9,   5.8,    3.4,    6.9
E05000040   Valence BarkingDagenham 9867    240 526 41897.9 13.2    100 9.8 4.7 2.8 7
E05000041   Village BarkingDagenham 10787   238 585 51537.5 14.7    100 9.7 4.3 2.6 6.8
E05000042   Whalebone   BarkingDagenham 10575   299 567 54068.4 14.1    100 8.9 4.3 2.6 6.5
E05000043   Brunswick Park  Barnet  16394   832 892 72028.8 11.7    100 6.4 3.6 2.6 6.6
E05000044   Burnt Oak   Barnet  18217   611 1226    68000.4 11.4    100 8.4 4.6 2.8 7.2
E05000045   Childs Hill Barnet  20049   1301    1300    69172.1 9.7 100 7   3.4 2.1 5.4
E05000046   Colindale   Barnet  17098   583 1145    65002   11.2    100 8.5 4.2 2.4 6
E05000047   Coppetts    Barnet  17250   936 1036    75344.7 11  100 7.3 3.7 2.1 5.4
E05000048   East Barnet Barnet  16137   776 863 79660   12.8    100 7.2 3.9 2.4 6
E05000049   East Finchley   Barnet  15989   883 946 72995.5 11.1    100 7.1 3.7 2   4.9
E05000050   Edgware Barnet  16728   999 887 69743.2 12.2    100 7.8 4.3 3   7
E05000051   Finchley Church End Barnet  15715   1272    842 62194.5 10.9    100 6.6 3.7 2.4 5.1
E05000052   Garden Suburb   Barnet  15929   1485    636 59431.5 10.4    100 7.5 3.7 2.4 5.7
E05000053   Golders Green   Barnet  18818   1155    986 53137.1 9.2 100 9.3 5.6 3.1 7.9
E05000054   Hale    Barnet  17437   967 980 76701.1 12.4    100 8.2 4.1 2.4 6.9
E05000055   Hendon  Barnet  18472   1099    1219    66641.3 10.5    100 8.1 3.7 2.2 5

Tags: to数据testdfagebyworkdistance
1条回答
网友
1楼 · 发布于 2024-09-30 10:38:57

您遇到语法错误,因为您无法使用字典理解。您不能声明i for i not in pc_cols_df : 'sum'并期望python知道您正在引用censusDF中的列(或者至少我假设您正在尝试)

将pct_cols_df更改为一个列表(无需将其更改为数据帧)或至少将其更改为一系列列名,然后以下代码将完成您想要的功能:

censusDF.groupby('Borough', as_index = False).agg({**{col: 'mean' 
for col in pc_cols_df}, **{col: 'sum' for col in [col for col in censusDF.columns if col not in pc_cols_df]}})

我不知道您使用的是什么python,所以字典合并可能会因此而中断

相关问题 更多 >

    热门问题