背景故事:我有一个熊猫数据帧scaledData
,它只是一个信息的标准df,如下所示:
COL NAME0 COL NAME1 ... COL NAME3 COL NAME4
0 Alabama 4.099099 ... 2.042345 1.392755
1 Alaska 1.396396 ... 1.000000 1.000000
2 Arizona 4.189189 ... 2.003257 1.537777
3 Arkansas 2.927928 ... 2.208723 1.007370
4 California 3.378378 ... 1.754930 2.012395
5 Colorado 3.378378 ... 3.282196 2.843435
6 Connecticut 5.000000 ... 1.452587 4.277286
7 Delaware 4.409692 ... 2.134501 1.970434
8 District of Columbia 5.000000 ... 1.000000 1.000000
9 Florida 4.628118 ... 1.806412 2.213038
10 Georgia 4.628118 ... 1.513896 2.748559
11 Hawaii 3.902494 ... 2.891694 3.872309
12 Idaho 1.090703 ... 2.978469 4.127419
13 Illinois 4.537415 ... 1.242970 1.888353
14 Indiana 4.537415 ... 2.368881 2.307914
15 Iowa 2.088435 ... 3.298368 3.421122
16 Kansas 2.723356 ... 2.791375 2.160330
17 Kentucky 3.902494 ... 1.692890 4.133744
18 Louisiana 2.451247 ... 1.000000 1.000000
19 Maine 3.448980 ... 2.535328 5.000000
20 Maryland 5.000000 ... 1.632194 1.046567
我想在这个df中创建另一个列Total
,它是将每个状态的所有列值(COL NAME0)相加除以字典weights
之和的结果。此外,列E
执行相同的合计,但仅针对具有这些特定标记的列。weights
字典的键是df的列名,值是一个元组,其中包含列的权重值(以前使用过,但与此问题无关)和列所属的类别。以下是我当前的实现:
weights = {'COL NAME1': (2.14, 'E'), 'COL NAME2': (5.14, 'E'), 'COL NAME3': (10, 'G'), 'COL NAME4' : (5, 'E')}
eWeights = { key: value for key, value in weights.items() if value[1] == 'E'}
gWeights = { key: value for key, value in weights.items() if value[1] == 'G'}
#Total should be the result of adding each of the columns per COL NAME0 row
#and dividing by the sum of the weight values.
scaledData['Total'] = scaledData.sum(axis = 1, skipna = True)/ sum(list(weights.values())[0])
#Same calculation on only columns marked 'E'
for key in eWeights:
scaledData['E'] = scaledData['E'] + scaledData[key]
scaledData['E'] = scaledData['E'] / sum(list(eWeights.values())[0])
不幸的是,上面的代码导致以下错误(由在scaledData
中创建Total
列的行引起):
TypeError: unsupported operand type(s) for +: 'float' and 'str'
我已经简化了scaledData
和weights
,但是任何解决方案或建议都将帮助我处理更多行和列的实际df。感谢您的帮助,如果需要更多信息请告诉我
您的数据框似乎存储为float。尝试:
相关问题 更多 >
编程相关推荐