python2.7:dataframegroupby并查找组中值的百分比分布

2024-05-19 10:24:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,我想找出一个组中一列值的百分比差异。你知道吗

组的一个例子是数据框groupby(['race','tire','stint'])。获得\u组((“澳大利亚大奖赛”,“超级软”,1))

我想了解组中每一行的“时间差”值的百分比分布。你知道吗

她是字典里的数据框格式。在那里将有许多其他组,但下面df只显示第一组。你知道吗

{'driverRef': {0: 'vettel',
  1: 'raikkonen',
  2: 'rosberg',
  4: 'hamilton',
  6: 'ricciardo',
  7: 'alonso',
  14: 'haryanto'},
 'race': {0: 'Australian Grand Prix',
  1: 'Australian Grand Prix',
  2: 'Australian Grand Prix',
  4: 'Australian Grand Prix',
  6: 'Australian Grand Prix',
  7: 'Australian Grand Prix',
  14: 'Australian Grand Prix'},
 'stint': {0: 1.0, 1: 1.0, 2: 1.0, 4: 1.0, 6: 1.0, 7: 1.0, 14: 1.0},
 'total diff': {0: 125147.50728499777,
  1: 281292.0366694695,
  2: 166278.41312954266,
  4: 64044.234019635056,
  6: 648383.28046950256,
  7: 400675.77449897071,
  14: 2846411.2560531585},
 'tyre': {0: u'Super soft',
  1: u'Super soft',
  2: u'Super soft',
  4: u'Super soft',
  6: u'Super soft',
  7: u'Super soft',
  14: u'Super soft'}}

Tags: 数据差异例子soft百分比groupbygrandsuper
1条回答
网友
1楼 · 发布于 2024-05-19 10:24:20

如果我正确理解您的需求,这可能会有所帮助:

sums = df.groupby(['race', 'tyre', 'stint'])['total diff'].sum()
df = df.set_index(['race', 'tyre', 'stint']).assign(pct=sums).reset_index()
df['pct'] = df['total diff'] / df['pct']

#                     race        tyre  stint  driverRef    total diff       pct
# 0  Australian Grand Prix  Super soft    1.0     vettel  1.251475e+05  0.027613
# 1  Australian Grand Prix  Super soft    1.0  raikkonen  2.812920e+05  0.062065
# 2  Australian Grand Prix  Super soft    1.0    rosberg  1.662784e+05  0.036688
# 3  Australian Grand Prix  Super soft    1.0   hamilton  6.404423e+04  0.014131
# 4  Australian Grand Prix  Super soft    1.0  ricciardo  6.483833e+05  0.143060
# 5  Australian Grand Prix  Super soft    1.0     alonso  4.006758e+05  0.088406
# 6  Australian Grand Prix  Super soft    1.0   haryanto  2.846411e+06  0.628037

相关问题 更多 >

    热门问题