Pandas：如何使用多索引运行轴？

import pandas as pd import numpy as np df= pd.DataFrame() month = np.arange(1, 13) values1 = np.random.randint(0, 100, 12) values2 = np.random.randint(200, 300, 12) df['month'] = np.hstack((month, month)) df['year'] = 2004 df['value'] = np.hstack((values1, values2)) df['item'] = np.hstack((np.repeat('item 1', 12), np.repeat('item 2', 12))) # This doesn't work: # ValueError: Wrong number of items passed 24, placement implies 2 # mypiv = df.pivot(['year', 'month'], 'item', 'value') # This doesn't work, either: # df.set_index(['year', 'month'], inplace=True) # ValueError: cannot label index with a null key # mypiv = df.pivot(columns='item', values='value') # This below works but is not ideal: # I have to first concatenate then separate the fields I need df['new field'] = df['year'] * 100 + df['month'] mypiv = df.pivot('new field', 'item', 'value').reset_index() mypiv['year'] = mypiv['new field'].apply( lambda x: int(x) / 100) mypiv['month'] = mypiv['new field'] % 100

2条回答

网友

1楼 · 编辑于 2024-07-02 04:12:08

你可以分组，然后取消后退。

>>> df.groupby(['year', 'month', 'item'])['value'].sum().unstack('item')
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209

或者使用pivot_table：

>>> df.pivot_table(
        values='value', 
        index=['year', 'month'], 
        columns='item', 
        aggfunc=np.sum)
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209

网友

2楼 · 编辑于 2024-07-02 04:12:08

我相信，如果您在多索引中包含item，那么您可以取消返回：

df.set_index(['year', 'month', 'item']).unstack(level=-1)

这将产生：

                value      
item       item 1 item 2
year month              
2004 1         21    277
     2         43    244
     3         12    262
     4         80    201
     5         22    287
     6         52    284
     7         90    249
     8         14    229
     9         52    205
     10        76    207
     11        88    259
     12        90    200

它比使用pivot_table快一点，与使用groupby的速度差不多或稍慢一点。

相关问题更多 >

编程相关推荐

热门问题

热门文章