将特定列值转换为标题并将标题转换为列值

2024-09-28 23:44:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有过这样的经历

Market_id   Metric  New_Mar_19  New_Feb_19  tot_Mar_19  tot_Feb_19  
7528            1       9           7           6.5         8           
7528            2       6           3.2         2.2         3           

我想把上面的csv转换成下面这样

  Month         Nrx_count       Nrx_quantity    Trx_count       Trx_quantity

    Mar_19      9                   6               6.5             2.2
    Feb_19      7                   3.2             8               3

我试过使用pivot和pd.熔化但我不能像上面说的那样。你知道吗


Tags: csvidnewcountmetricmarketmarfeb
1条回答
网友
1楼 · 发布于 2024-09-28 23:44:22

其思想是将不带_的列转换为MultiIndex,然后按_拆分列,再按^{}重塑,然后按^{}

df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
                         New  tot
Market_id Metric                 
7528      1      Feb_19  7.0  8.0
                 Mar_19  9.0  6.5
          2      Feb_19  3.2  3.0
                 Mar_19  6.0  2.2

#second level, here Metric is converted to columns
df1 = df1.unstack(1)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
   Market_id       y  New_1  New_2  tot_1  tot_2
0       7528  Feb_19    7.0    3.2    8.0    3.0
1       7528  Mar_19    9.0    6.0    6.5    2.2

#first level, here Market_id is converted to columns
df1 = df1.unstack(0)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
   Metric       y  New_7528  tot_7528
0       1  Feb_19       7.0       8.0
1       1  Mar_19       9.0       6.5
2       2  Feb_19       3.2       3.0
3       2  Mar_19       6.0       2.2

编辑:

如果获取错误:

ValueError: Index contains duplicate entries, cannot reshape

意思是有重复的。解决方案是相同的,只需要添加聚合-例如通过meansum。。。你知道吗

#change data for duplicates in column for Multiindex - Market_id, Metric
print (df)
   Market_id  Metric  New_Mar_19  New_Feb_19  tot_Mar_19  tot_Feb_19
0       7528       1           2         5.0         5.0           8
1       7528       1          10        20.0         2.5          16
2       7528       2           6         3.2         2.2           3  

df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
                          New   tot
Market_id Metric                   
7528      1      Feb_19   5.0   8.0
                 Mar_19   2.0   5.0
                 Feb_19  20.0  16.0
                 Mar_19  10.0   2.5
          2      Feb_19   3.2   3.0
                 Mar_19   6.0   2.2

#number of levels passes to groupby - here 3 levels -> 0,1,2
print (df1.index.nlevels)
3

#grouping by all levels and aggregate mean, sum,max...
df1 = df1.groupby(level=[0,1,2]).mean().unstack(1)
print (df1)
                   New         tot     
Metric               1    2      1    2
Market_id                              
7528      Feb_19  12.5  3.2  12.00  3.0
          Mar_19   6.0  6.0   3.75  2.2

df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
   Market_id       y  New_1  New_2  tot_1  tot_2
0       7528  Feb_19   12.5    3.2  12.00    3.0
1       7528  Mar_19    6.0    6.0   3.75    2.2

相关问题 更多 >