Pandas:在多列中转换一列

2024-09-28 23:47:58 发布

您现在位置:Python中文网/ 问答频道 /正文

以下是我掌握的数据:

            cmte_id   trans   entity    st  amount    fec_id
date                        
2007-08-15  C00112250   24K     ORG     DC  2000    C00431569
2007-09-26  C00119040   24K     CCM     FL  1000    C00367680
2007-09-26  C00119040   24K     CCM     MD  1000    C00140715
2007-07-20  C00346296   24K     CCM     CA  1000    C00434571
2007-09-24  C00346296   24K     CCM     MA  1000    C00433136

为了简洁起见,我省略了其他描述性专栏。 我想转换它,使[cmte_id]中的值成为列标题,[amount]中的值成为新列中的相应值。我知道这可能是一个简单的枢轴操作。我试过以下方法:

^{pr2}$

期望的最终结果(除了附加的列,例如'trans'、fec_id、'st'等)将如下所示:

    date    C00112250   C00119040   C00119040   C00346296   C00346296
2007-ago-15 2000                
2007-set-26             1000            
2007-set-26                           1000      
2007-lug-20                                        1000 
2007-set-24                                                    1000

有人知道我怎样才能更接近最终产品吗?在


Tags: 数据orgidtransdateccmamountentity
1条回答
网友
1楼 · 发布于 2024-09-28 23:47:58

试试这个:

pvt = pd.pivot_table(df, index=df.index, columns='cmte_id',
                     values='amount', aggfunc='sum', fill_value=0)

保留其他列:

^{pr2}$

更新:

import pandas as pd
import glob


# if you don't use ['cand_id'] column - remove it from `usecols` parameter
dfy = pd.concat([pd.read_csv(f, sep='|', low_memory=False, header=None,
                             names=['cmte_id', '2', '3', '4','5', 'trans_typ', 'entity_typ', '8', '9', 'state', '11', 'employer', 'occupation', 'date', 'amount', 'fec_id', 'cand_id', '18', '19', '20', '21', '22'],
                             usecols= ['date', 'cmte_id', 'trans_typ', 'entity_typ', 'state', 'amount', 'fec_id', 'cand_id'],
                             dtype={'date': str})
                 for f in glob.glob('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas2**.txt')
                ],
                ignore_index=True) 

dfy['date'] = pd.to_datetime(dfy['date'], format='%m%d%Y')

# remove not needed column ASAP in order to save memory
del dfy['cand_id']

dfy = dfy[(dfy['date'].notnull()) & (dfy['date'] > '2007-01-01') & (dfy['date'] < '2014-12-31') ]

#df = dfy.set_index(['date'])

pvt = pd.pivot_table(dfy, index=['date','trans_typ','entity_typ','state','fec_id'],
                     columns='cmte_id', values='amount', aggfunc='sum', fill_value=0) \
        .reset_index()


print(pvt.info())

pvt.to_excel('out.xlsx', index=False)

相关问题 更多 >