使用多个数字列显示dataframe中每个组的前5行

2024-10-05 10:40:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用以下数据帧:df.groupby(['departamento','campo']).describe()

DFU统计:

                            produccion                                         
                                mean           std          min           max
departamento campo                                                           
f7fd2c4f     8dd7c41b    4714.695603   1076.940951  3091.015553   6378.546534
             82edafb9    1851.291482    841.512944   675.814722   3006.476183
             58a0d8ca    1768.151315    347.896113  1033.459536   2242.544338
             8ba362f3     257.917212    231.490925     0.000000    497.916659
             4f4a249f     192.811711     80.299111   129.190598    356.437730
             741abe20     431.717352     71.053604   291.831556    529.518332
             51cbb05d     489.804186     65.542073   353.186216    582.869264
             4d0fb45e     358.597250     30.166391   314.168045    407.842103
             c98bd9dd     437.244383     27.135823   402.546159    481.245852
             7eb34927     106.426374     22.579237    81.994706    142.283652
ec12ad00     44502c89      15.015145     11.467353     0.000000     29.241879
             5558f26e       1.107400      0.959445     0.000000      2.762156
             85c1a0e5       0.122720      0.425113     0.000000      1.472635
cf33cb8a     2f614c0b   12458.858168  12042.715975   150.635367  25999.977584
             5559f8d7    4272.447078   1326.999765  2458.231739   6059.658900
             fd6f6562    3378.712031   1194.101786   869.763739   4814.220212
             febb6cf6    4149.936221    833.663173  2471.139924   5827.822674
             d56beadb     474.831361    810.840341     0.000000   2283.465569
             124207de    3863.484888    796.945367  2713.111304   5150.735620
             1f d2689f   6099.963902    768.102604  4766.241346   7897.993261
             c728bf96    3361.623457    704.293795  2203.721911   4949.989960

我已经根据标准偏差(“std”)列对数据帧进行了排序,但我只想在“departmento”列中显示每组的前5个值

我尝试了以下代码:df_statistics.nlargest(5, columns =('produccion','std'))

但我在“departamento”一栏中获得了前五名:

                            produccion                                         
                               mean           std          min           max
departamento campo                                                          
cf33cb8a     2f614c0b  12458.858168  12042.715975   150.635367  25999.977584
             5559f8d7   4272.447078   1326.999765  2458.231739   6059.658900
             fd6f6562   3378.712031   1194.101786   869.763739   4814.220212
f7fd2c4f     8dd7c41b   4714.695603   1076.940951  3091.015553   6378.546534
             82edafb9   1851.291482    841.512944   675.814722   3006.476183

如何根据“std”列显示每组的前5个值


Tags: 数据dfminmeanmaxstdgroupbydescribe
2条回答

IIUC

df.groupby('departamento').head(5)

输出:

                         produccion                                         
                               mean           std          min           max
departamento campo                                                          
f7fd2c4f     8dd7c41b   4714.695603   1076.940951  3091.015553   6378.546534
             82edafb9   1851.291482    841.512944   675.814722   3006.476183
             58a0d8ca   1768.151315    347.896113  1033.459536   2242.544338
             8ba362f3    257.917212    231.490925     0.000000    497.916659
             4f4a249f    192.811711     80.299111   129.190598    356.437730
ec12ad00     44502c89     15.015145     11.467353     0.000000     29.241879
             5558f26e      1.107400      0.959445     0.000000      2.762156
             85c1a0e5      0.122720      0.425113     0.000000      1.472635
cf33cb8a     2f614c0b  12458.858168  12042.715975   150.635367  25999.977584
             5559f8d7   4272.447078   1326.999765  2458.231739   6059.658900
             fd6f6562   3378.712031   1194.101786   869.763739   4814.220212
             febb6cf6   4149.936221    833.663173  2471.139924   5827.822674
             d56beadb    474.831361    810.840341     0.000000   2283.465569

@接收频率是正确的

df.sort_values(by=('produccion',  'std'), ascending=False)\
  .groupby('departamento')\
  .head(5)\
  .sort_index()

首先对数据帧进行排序,然后使用headsort_indexgroupby进行排序

使用另一个groupby

df_statistics.groupby('departamento')\
             .apply(lambda grp: grp.nlargest(5, columns=('produccion', 'std')))

相关问题 更多 >

    热门问题