我的目标是通过从其他数据帧中随机抽样来构建一个数据帧,收集新数据帧的汇总统计信息,然后将这些统计信息附加到列表中。理想情况下,我可以反复n次地遍历这个过程(例如bootstrap)。你知道吗
dfposlist = [OFdf, Firstdf, Seconddf, Thirddf, CFdf, RFdf, Cdf, SSdf]
OFdf.head()
playerID OPW POS salary
87 bondsba01 62.061290 OF 8541667
785 ramirma02 35.785630 OF 13050000
966 walkela01 30.644305 OF 6050000
859 sheffga01 29.090699 OF 9916667
357 gilesbr02 28.160054 OF 7666666
列表中的所有数据帧都有相同的头。我想做的事情是这样的:
teamdist = []
for df in dfposlist:
frames = [df.sample(n=1)]
team = pd.concat(frames)
teamopw = team['OPW'].sum()
teamsal = team['salary'].sum()
teamplayers = team['playerID'].tolist()
teamdic = {'Salary':teamsal, 'OPW':teamopw, 'Players':teamplayers}
teamdist.append(teamdic)
我想要的输出是这样的:
teamdist = [{'Salary':4900000, 'OPW':78.452, 'Players':[bondsba01, etc, etc]}]
但是由于某些原因,所有像teamopw = team['OPW'].sum()
这样的求和操作都不能按我所希望的方式工作,只返回team['OPW']
中的元素
print(teamopw)
0.17118131814601256
38.10700006434629
1.5699939126695253
32.9068837019903
16.990760776263674
18.22428871113601
13.447706356730897
有什么建议可以让它工作吗?谢谢!你知道吗
编辑:工作方案如下。我不确定这是不是最适合Python的方式,但它是有效的。你知道吗
teamdist = []
team = pd.concat([df.sample(n=1) for df in dfposlist])
teamopw = team[['OPW']].values.sum()
teamsal = team[['salary']].values.sum()
teamplayers = team['playerID'].tolist()
teamdic = {'Salary':teamsal, 'OPW':teamopw, 'Players':teamplayers}
teamdist.append(teamdic)
此处(随机数据):
相关问题 更多 >
编程相关推荐