通过在数据框中旋转数据,将具有相同名称的列重新排列为一列

2024-10-01 02:27:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个具有以下结构的数据帧:

ID Material Description color size dim color size dim Tech
1  xcv456    Rubber       101   s   32  102    m   34  elastic

我想把它转换成:

ID Material Description color size dim tech
1  xcv456   Rubber       101   s    32  elastic
1  xcv456   Rubber       102   m    34  elastic

我有一个5行5414列的文件,所以我正在尝试自动化程序检测冗余列并将其转换为所需的输出格式的过程。非常感谢您的帮助


Tags: 文件数据程序idsizedescription结构tech
2条回答

用途:

#mask for all duplicates columns
m = df.columns.duplicated(keep=False)
#set index with not dupe columns
df = df.set_index(df.columns[~m].tolist())
#count dupes for MultiIndex
s = df.columns.to_series()
df.columns = [df.columns, s.groupby(s).cumcount()]
#reshape and remove 4 level, because 4 non dupe columns
df = df.stack().reset_index(level=4, drop=True).reset_index()
print (df)
   ID Material Description     Tech  color  dim size
0   1   xcv456      Rubber  elastic    101   32    s
1   1   xcv456      Rubber  elastic    102   34    m

print (df)
   ID Material Description  color size  dim  color size  dim      Tech
0   1   xcv456      Rubber    101    s   32    102    m   34   elastic
1   2   xcv457     Rubber1    101    s   37    108    m   55  elastic2

#mask for all duplicates columns
m = df.columns.duplicated(keep=False)
#set index with not dupe columns
df = df.set_index(df.columns[~m].tolist())
#count dupes for MultiIndex
s = df.columns.to_series()
df.columns = [df.columns, s.groupby(s).cumcount()]
df = df.stack().reset_index(level=4, drop=True).reset_index()
print (df)
   ID Material Description      Tech  color  dim size
0   1   xcv456      Rubber   elastic    101   32    s
1   1   xcv456      Rubber   elastic    102   34    m
2   2   xcv457     Rubber1  elastic2    101   37    s
3   2   xcv457     Rubber1  elastic2    108   55    m

在使用pd.wide_to_Long之前需要一点处理

hh=pd.Series(df.columns)
df.columns=hh+hh.groupby(hh).cumcount().add(1).astype(str)
pd.wide_to_long(df,['color','size','dim'],i=['ID1','Material1','Description1','Tech1'],j='drop').reset_index().drop('drop',1
                                                                                                                )
Out[556]: 
   ID1 Material1 Description1    Tech1  color size  dim
0    1    xcv456       Rubber  elastic    101    s   32
1    1    xcv456       Rubber  elastic    102    m   34

相关问题 更多 >