如何从已排序的数据帧创建此表?

2024-10-03 13:30:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经对数据框进行了排序,现在我想创建这个表: output

数据样本:

df4 = {'category': {0: 'HC', 1: 'HC', 2: 'HC', 3: 'AMG HC', 4: 'MUP', 5: 'MUP', 6: 'AMG MUP', 7: 'AMG MUP', 8: 'AMG MUP', 9: 'S', 10: 'S', 11: 'AMG S', 12: 'AMG S'}, 'segment': {0: 'Online', 1: 'Offline', 2: 'Independent', 3: 'Online', 4: 'Online', 5: 'Online', 6: 'Online', 7: 'Offline', 8: 'Independent', 9: 'Online', 10: 'Offline', 11: 'Online', 12: 'Offline'}, 'ytd2018': {0: '1 000', 1: '3 000', 2: '800', 3: '500', 4: '2 000', 5: '1 200', 6: '500', 7: '1 020', 8: '310', 9: '1 500', 10: '2 300', 11: '400', 12: '1 080'}, 'ytd2019': {0: '1 100', 1: '2 800', 2: '920', 3: '450', 4: '1 800', 5: '1 080', 6: '600', 7: '1 020', 8: '440', 9: '2 000', 10: '2 200', 11: '650', 12: '1 100'}, 'Evolution': {0: '10%', 1: '-7%', 2: '15%', 3: '-10%', 4: '-10%', 5: '-10%', 6: '20%', 7: '0%', 8: '42%', 9: '33%', 10: '-4%', 11: '63%', 12: '2%'}}

注意:我已经格式化了Evolution、ytd2018和ytd2019列,所以它们是字符串

我尝试过以下代码:

df4 = (df4.reset_index(drop=True)
   .set_index(['category', 'segment'], append=True)
   .unstack()
   .swaplevel(axis=1)
   .sort_index(level=0, axis=1, ascending=False)
   .reindex(['ytd2018, 'ytd2019', 'Evolution'], level=1, axis=1)
#   .sort_index(level=0, axis=0)
)

但它在段中打印具有空值的重复行,例如HC online而不脱机和独立,然后HC offline而不联机和独立等

提前谢谢你


Tags: 数据hcindexsegmentlevelonlinecategoryoffline
3条回答

您可以使用pivot_tablefirst作为聚合函数:

(df4.pivot_table(index='category',
                 columns='segment',
                 values=['ytd2018', 'ytd2019', 'Evolution'],
                 aggfunc='first')
    .swaplevel(axis=1)
    .sort_index(level=0, axis=1, ascending=False)
    .reindex(['ytd2018', 'ytd2019', 'Evolution'], level=1, axis=1)
)

输出:

segment   Online                   Offline                   Independent                  
         ytd2018 ytd2019 Evolution ytd2018 ytd2019 Evolution     ytd2018 ytd2019 Evolution
category                                                                                  
AMG HC       500     450      -10%     NaN     NaN       NaN         NaN     NaN       NaN
AMG MUP      500     600       20%   1 020   1 020        0%         310     440       42%
AMG S        400     650       63%   1 080   1 100        2%         NaN     NaN       NaN
HC         1 000   1 100       10%   3 000   2 800       -7%         800     920       15%
MUP        2 000   1 800      -10%     NaN     NaN       NaN         NaN     NaN       NaN
S          1 500   2 000       33%   2 300   2 200       -4%         NaN     NaN       NaN

您可以这样做:

df5 = (df4.reset_index(drop=True)
          .set_index(['category', 'segment'], append=True)
          .unstack()
          .swaplevel(axis=1)
          .sort_index(level=0, axis=1, ascending=False)
          .reindex(['ytd2018', 'ytd2019', 'Evolution'], level=1, axis=1)
      )

df6 = df5.groupby(level='category', sort=False).first()

结果:

print(df6)



segment   Online                   Offline                   Independent                  
         ytd2018 ytd2019 Evolution ytd2018 ytd2019 Evolution     ytd2018 ytd2019 Evolution
category                                                                                  
HC         1 000   1 100       10%   3 000   2 800       -7%         800     920       15%
AMG HC       500     450      -10%    None    None      None        None    None      None
MUP        2 000   1 800      -10%    None    None      None        None    None      None
AMG MUP      500     600       20%   1 020   1 020        0%         310     440       42%
S          1 500   2 000       33%   2 300   2 200       -4%        None    None      None
AMG S        400     650       63%   1 080   1 100        2%        None    None      None

最后按level=1对结果进行分组,然后调用first以获得所需的结果:

>>>  (df4.set_index(['category', 'segment'], append=True)
         .unstack()
         .swaplevel(axis=1)
         .sort_index(level=0, axis=1, ascending=False)
         .reindex(['ytd2018', 'ytd2019', 'Evolution'], level=1, axis=1)
         .groupby(level=1, sort=False)
         .first()
         )

输出:

segment   Online                   Offline                   Independent  \
         ytd2018 ytd2019 Evolution ytd2018 ytd2019 Evolution     ytd2018   
category                                                                   
HC         1 000   1 100       10%   3 000   2 800       -7%         800   
AMG HC       500     450      -10%    None    None      None        None   
MUP        2 000   1 800      -10%    None    None      None        None   
AMG MUP      500     600       20%   1 020   1 020        0%         310   
S          1 500   2 000       33%   2 300   2 200       -4%        None   
AMG S        400     650       63%   1 080   1 100        2%        None   
segment                     
         ytd2019 Evolution  
category                    
HC           920       15%  
AMG HC      None      None  
MUP         None      None  
AMG MUP      440       42%  
S           None      None  
AMG S       None      None  

相关问题 更多 >