如何转换数据帧以使列值成为行值

2024-09-30 20:34:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有如下数据帧,如下所示:

df = pd.DataFrame({'fruit': ['berries','berries', 'berries', 'tropical', 
'tropical','tropical','berries','nuts'], 
           'code': [100,100,100,200,200, 300,400,500],
           'subcode': ['100A', '100B', '100C','200A', '200B','300A', 
           '400A', '500A']})


    code    fruit   subcode
  0 100     berries 100A
  1 100     berries 100B
  2 100     berries 100C
  3 200     tropica 200A
  4 200     tropical 200B
  5 300     tropical 300A
  6 400     berries 400A
  7 500     nuts    500A

我想将数据帧转换为以下格式:

    code    fruit   subcode1 subcode1 subcode1
  0 100     berries 100A      100B   100C
  3 200     tropica 200A      200B
  5 300     tropical 300A
  6 400     berries 400A
  7 500     nuts    500A 

不幸的是,我被困在如何继续。我查阅过像Unmelt Pandas DataFrame这样的帖子,并且有stack和unstack的组合。我怀疑这其中也涉及到一些连接。如果您能给我指点方向,我将不胜感激!你知道吗


Tags: 数据dataframepandasdf格式codepdnuts
3条回答

玩一下set_indexunstack,你就会得到它。你知道吗

(df.set_index(['code', 'fruit'])
   .set_index(df.subcode.str.extract('([a-zA-Z]+)', expand=False), append=True)
   .subcode
   .unstack()
   .fillna('')                  # these last three 
   .reset_index()               # operations are  
   .rename_axis(None, axis=1)   # not important
)

   code     fruit     A     B     C
0   100   berries  100A  100B  100C
1   200  tropical  200A  200B      
2   300  tropical  300A            
3   400   berries  400A            
4   500      nuts  500A            

defaultdict

from collections import defaultdict


d = defaultdict(list)

for f, c, s in df.itertuples(index=False):
    d[(f, c)].append(s)

pd.DataFrame.from_dict(
    {k: dict(enumerate(v)) for k, v in d.items()}, orient='index'
).add_prefix('subcode').rename_axis(['fruit', 'code']).reset_index()

      fruit  code subcode0 subcode1 subcode2
0   berries   100     100A     100B     100C
1   berries   400     400A      NaN      NaN
2      nuts   500     500A      NaN      NaN
3  tropical   200     200A     200B      NaN
4  tropical   300     300A      NaN      NaN

您可以使用groupby,获取值并将它们转换为序列。你知道吗

df.groupby(['code','fruit'])['subcode'].apply(
         lambda x: x.values
      ).apply(pd.Series)
       .add_prefix('subcode_')

                subcode_0 subcode_1 subcode_2
code fruit                                 
100  berries       100A      100B      100C
200  tropical      200A      200B       NaN
300  tropical      300A       NaN       NaN
400  berries       400A       NaN       NaN
500  nuts          500A       NaN       NaN

相关问题 更多 >