将特定列值转换为标题并将标题转换为列值

Market_id Metric New_Mar_19 New_Feb_19 tot_Mar_19 tot_Feb_19 7528 1 9 7 6.5 8 7528 2 6 3.2 2.2 3

Month Nrx_count Nrx_quantity Trx_count Trx_quantity Mar_19 9 6 6.5 2.2 Feb_19 7 3.2 8 3

1条回答

网友

1楼 · 发布于 2024-09-28 23:44:22

其思想是将不带_的列转换为MultiIndex，然后按_拆分列，再按^{}重塑，然后按^{}：

df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
                         New  tot
Market_id Metric                 
7528      1      Feb_19  7.0  8.0
                 Mar_19  9.0  6.5
          2      Feb_19  3.2  3.0
                 Mar_19  6.0  2.2

#second level, here Metric is converted to columns
df1 = df1.unstack(1)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
   Market_id       y  New_1  New_2  tot_1  tot_2
0       7528  Feb_19    7.0    3.2    8.0    3.0
1       7528  Mar_19    9.0    6.0    6.5    2.2

#first level, here Market_id is converted to columns
df1 = df1.unstack(0)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
   Metric       y  New_7528  tot_7528
0       1  Feb_19       7.0       8.0
1       1  Mar_19       9.0       6.5
2       2  Feb_19       3.2       3.0
3       2  Mar_19       6.0       2.2

编辑：

如果获取错误：

ValueError: Index contains duplicate entries, cannot reshape

意思是有重复的。解决方案是相同的，只需要添加聚合-例如通过mean，sum。。。你知道吗

#change data for duplicates in column for Multiindex - Market_id, Metric
print (df)
   Market_id  Metric  New_Mar_19  New_Feb_19  tot_Mar_19  tot_Feb_19
0       7528       1           2         5.0         5.0           8
1       7528       1          10        20.0         2.5          16
2       7528       2           6         3.2         2.2           3  

df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
                          New   tot
Market_id Metric                   
7528      1      Feb_19   5.0   8.0
                 Mar_19   2.0   5.0
                 Feb_19  20.0  16.0
                 Mar_19  10.0   2.5
          2      Feb_19   3.2   3.0
                 Mar_19   6.0   2.2

#number of levels passes to groupby - here 3 levels -> 0,1,2
print (df1.index.nlevels)
3

#grouping by all levels and aggregate mean, sum,max...
df1 = df1.groupby(level=[0,1,2]).mean().unstack(1)
print (df1)
                   New         tot     
Metric               1    2      1    2
Market_id                              
7528      Feb_19  12.5  3.2  12.00  3.0
          Mar_19   6.0  6.0   3.75  2.2

df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
   Market_id       y  New_1  New_2  tot_1  tot_2
0       7528  Feb_19   12.5    3.2  12.00    3.0
1       7528  Mar_19    6.0    6.0   3.75    2.2

相关问题更多 >

编程相关推荐

热门问题

热门文章

将特定列值转换为标题并将标题转换为列值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >