2024-09-28 23:44:22 发布
网友
我有过这样的经历
Market_id Metric New_Mar_19 New_Feb_19 tot_Mar_19 tot_Feb_19 7528 1 9 7 6.5 8 7528 2 6 3.2 2.2 3
我想把上面的csv转换成下面这样
Month Nrx_count Nrx_quantity Trx_count Trx_quantity Mar_19 9 6 6.5 2.2 Feb_19 7 3.2 8 3
我试过使用pivot和pd.熔化但我不能像上面说的那样。你知道吗
其思想是将不带_的列转换为MultiIndex,然后按_拆分列,再按^{}重塑,然后按^{}:
_
MultiIndex
df1 = df.set_index(['Market_id','Metric']) df1.columns = df1.columns.str.split('_', n=1, expand=True) df1 = df1.stack() print (df1) New tot Market_id Metric 7528 1 Feb_19 7.0 8.0 Mar_19 9.0 6.5 2 Feb_19 3.2 3.0 Mar_19 6.0 2.2 #second level, here Metric is converted to columns df1 = df1.unstack(1) df1.columns = [f'{a}_{b}' for a, b in df1.columns] df1 = df1.reset_index().rename(columns={'level_1':'y'}) print (df1) Market_id y New_1 New_2 tot_1 tot_2 0 7528 Feb_19 7.0 3.2 8.0 3.0 1 7528 Mar_19 9.0 6.0 6.5 2.2
#first level, here Market_id is converted to columns df1 = df1.unstack(0) df1.columns = [f'{a}_{b}' for a, b in df1.columns] df1 = df1.reset_index().rename(columns={'level_1':'y'}) print (df1) Metric y New_7528 tot_7528 0 1 Feb_19 7.0 8.0 1 1 Mar_19 9.0 6.5 2 2 Feb_19 3.2 3.0 3 2 Mar_19 6.0 2.2
编辑:
如果获取错误:
ValueError: Index contains duplicate entries, cannot reshape
意思是有重复的。解决方案是相同的,只需要添加聚合-例如通过mean,sum。。。你知道吗
mean
sum
#change data for duplicates in column for Multiindex - Market_id, Metric print (df) Market_id Metric New_Mar_19 New_Feb_19 tot_Mar_19 tot_Feb_19 0 7528 1 2 5.0 5.0 8 1 7528 1 10 20.0 2.5 16 2 7528 2 6 3.2 2.2 3 df1 = df.set_index(['Market_id','Metric']) df1.columns = df1.columns.str.split('_', n=1, expand=True) df1 = df1.stack() print (df1) New tot Market_id Metric 7528 1 Feb_19 5.0 8.0 Mar_19 2.0 5.0 Feb_19 20.0 16.0 Mar_19 10.0 2.5 2 Feb_19 3.2 3.0 Mar_19 6.0 2.2
#number of levels passes to groupby - here 3 levels -> 0,1,2 print (df1.index.nlevels) 3 #grouping by all levels and aggregate mean, sum,max... df1 = df1.groupby(level=[0,1,2]).mean().unstack(1) print (df1) New tot Metric 1 2 1 2 Market_id 7528 Feb_19 12.5 3.2 12.00 3.0 Mar_19 6.0 6.0 3.75 2.2
df1.columns = [f'{a}_{b}' for a, b in df1.columns] df1 = df1.reset_index().rename(columns={'level_1':'y'}) print (df1) Market_id y New_1 New_2 tot_1 tot_2 0 7528 Feb_19 12.5 3.2 12.00 3.0 1 7528 Mar_19 6.0 6.0 3.75 2.2
其思想是将不带} 重塑,然后按^{} :
_
的列转换为MultiIndex
,然后按_
拆分列,再按^{编辑:
如果获取错误:
意思是有重复的。解决方案是相同的,只需要添加聚合-例如通过
mean
,sum
。。。你知道吗相关问题 更多 >
编程相关推荐