收集通过随机抽样其他数据帧构建的数据帧的摘要统计信息

2024-10-04 07:33:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我的目标是通过从其他数据帧中随机抽样来构建一个数据帧,收集新数据帧的汇总统计信息,然后将这些统计信息附加到列表中。理想情况下,我可以反复n次地遍历这个过程(例如bootstrap)。你知道吗

dfposlist = [OFdf, Firstdf, Seconddf, Thirddf, CFdf, RFdf, Cdf, SSdf]

OFdf.head()
    playerID    OPW         POS salary
87  bondsba01   62.061290   OF  8541667
785 ramirma02   35.785630   OF  13050000
966 walkela01   30.644305   OF  6050000
859 sheffga01   29.090699   OF  9916667
357 gilesbr02   28.160054   OF  7666666

列表中的所有数据帧都有相同的头。我想做的事情是这样的:

teamdist = []
for df in dfposlist:
    frames = [df.sample(n=1)]
team = pd.concat(frames)

teamopw = team['OPW'].sum()
teamsal = team['salary'].sum()
teamplayers = team['playerID'].tolist()

teamdic = {'Salary':teamsal, 'OPW':teamopw, 'Players':teamplayers}
teamdist.append(teamdic)

我想要的输出是这样的:

teamdist = [{'Salary':4900000, 'OPW':78.452, 'Players':[bondsba01, etc, etc]}]

但是由于某些原因,所有像teamopw = team['OPW'].sum()这样的求和操作都不能按我所希望的方式工作,只返回team['OPW']中的元素

print(teamopw)
0.17118131814601256
38.10700006434629
1.5699939126695253
32.9068837019903
16.990760776263674
18.22428871113601
13.447706356730897

有什么建议可以让它工作吗?谢谢!你知道吗

编辑:工作方案如下。我不确定这是不是最适合Python的方式,但它是有效的。你知道吗

teamdist = []
team = pd.concat([df.sample(n=1) for df in dfposlist])

teamopw = team[['OPW']].values.sum()
teamsal = team[['salary']].values.sum()
teamplayers = team['playerID'].tolist()

teamdic = {'Salary':teamsal, 'OPW':teamopw, 'Players':teamplayers}
teamdist.append(teamdic)

Tags: of数据dfteamsumsalaryplayeridopw
1条回答
网友
1楼 · 发布于 2024-10-04 07:33:37

此处(随机数据):

import pandas as pd
import numpy as np

dfposlist = dict(zip(range(10),
                     [pd.DataFrame(np.random.randn(10, 5),
                                   columns=list('abcde'))
                     for i in range(10)]))
for df in dfposlist.values():
    df['f'] = list('qrstuvwxyz')

teamdist = []
team = pd.concat([df.sample(n=1) for df in dfposlist.values()])
print(team.info())

teamdic = team[['a', 'c', 'e']].sum().to_dict()
teamdic['f'] = team['f'].tolist()
teamdist.append(teamdic)
print(teamdist)

# Output:
## team.info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 1 to 6
Data columns (total 6 columns):
a    10 non-null float64
b    10 non-null float64
c    10 non-null float64
d    10 non-null float64
e    10 non-null float64
f    10 non-null object
dtypes: float64(5), object(1)
memory usage: 560.0+ bytes
None

## teamdist:
[{'a': -3.5380097363724601,
  'c': 2.0951152809401776,
  'e': 3.1439230427971863,
  'f': ['r', 'w', 'z', 'v', 'x', 'q', 't', 'q', 'v', 'w']}]

相关问题 更多 >