加权平均值的df列总和

2024-10-03 13:20:26 发布

您现在位置:Python中文网/ 问答频道 /正文

背景故事:我有一个熊猫数据帧scaledData,它只是一个信息的标准df,如下所示:

                  COL NAME0 COL NAME1  ...    COL NAME3    COL NAME4
0                Alabama     4.099099  ...    2.042345      1.392755
1                 Alaska     1.396396  ...    1.000000      1.000000
2                Arizona     4.189189  ...    2.003257      1.537777
3               Arkansas     2.927928  ...    2.208723      1.007370
4             California     3.378378  ...    1.754930      2.012395
5               Colorado     3.378378  ...    3.282196      2.843435
6            Connecticut     5.000000  ...    1.452587      4.277286
7               Delaware     4.409692  ...    2.134501      1.970434
8   District of Columbia     5.000000  ...    1.000000      1.000000
9                Florida     4.628118  ...    1.806412      2.213038
10               Georgia     4.628118  ...    1.513896      2.748559
11                Hawaii     3.902494  ...    2.891694      3.872309
12                 Idaho     1.090703  ...    2.978469      4.127419
13              Illinois     4.537415  ...    1.242970      1.888353
14               Indiana     4.537415  ...    2.368881      2.307914
15                  Iowa     2.088435  ...    3.298368      3.421122
16                Kansas     2.723356  ...    2.791375      2.160330
17              Kentucky     3.902494  ...    1.692890      4.133744
18             Louisiana     2.451247  ...    1.000000      1.000000
19                 Maine     3.448980  ...    2.535328      5.000000
20              Maryland     5.000000  ...    1.632194      1.046567

我想在这个df中创建另一个列Total,它是将每个状态的所有列值(COL NAME0)相加除以字典weights之和的结果。此外,列E执行相同的合计,但仅针对具有这些特定标记的列。weights字典的键是df的列名,值是一个元组,其中包含列的权重值(以前使用过,但与此问题无关)和列所属的类别。以下是我当前的实现:

weights = {'COL NAME1': (2.14, 'E'), 'COL NAME2': (5.14, 'E'), 'COL NAME3': (10, 'G'), 'COL NAME4' : (5, 'E')}

eWeights = { key: value for key, value in weights.items() if value[1] == 'E'}
gWeights = { key: value for key, value in weights.items() if value[1] == 'G'}

#Total should be the result of adding each of the columns per COL NAME0 row 
#and dividing by the sum of the weight values. 

scaledData['Total'] = scaledData.sum(axis = 1, skipna = True)/ sum(list(weights.values())[0])

#Same calculation on only columns marked 'E'

for key in eWeights:
    scaledData['E'] = scaledData['E'] + scaledData[key]
    scaledData['E'] = scaledData['E'] / sum(list(eWeights.values())[0])

不幸的是,上面的代码导致以下错误(由在scaledData中创建Total列的行引起):

TypeError: unsupported operand type(s) for +: 'float' and 'str'

我已经简化了scaledDataweights,但是任何解决方案或建议都将帮助我处理更多行和列的实际df。感谢您的帮助,如果需要更多信息请告诉我


Tags: ofthekeyindfforvaluecol
1条回答
网友
1楼 · 发布于 2024-10-03 13:20:26

您的数据框似乎存储为float。尝试:

for key in eWeights:
    scaledData['E'] = scaledData['E'].astype(float) + scaledData[key].astype(float)

    scaledData['E'] / sum(list(eWeights.values())[0])
    # should this be a print? Are you trying to set any values?

相关问题 更多 >