合并多索引数据帧

2024-07-01 06:53:42 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑以下两个数据帧:

arrays1 = [['foo', 'bar', 'bar', 'bar'],
          ['A', 'D', 'E', 'F']]
tuples1 = list(zip(*arrays1))          
columnValues1 = pd.MultiIndex.from_tuples(tuples1)
df1 = pd.DataFrame(np.random.rand(4,4), columns = columnValues1)
print(df1)
        foo       bar                    
          A         D         E         F
0  0.833444  0.354676  0.468294  0.173005
1  0.409730  0.275342  0.595433  0.322785
2  0.515161  0.340063  0.117509  0.491957
3  0.285594  0.970524  0.322902  0.628351

arrays2 = [['foo', 'foo', 'bar', 'bar'],
          ['B', 'C', 'G', 'H']]
tuples2 = list(zip(*arrays2))          
columnValues2 = pd.MultiIndex.from_tuples(tuples2)
df2 = pd.DataFrame(np.random.rand(4,4), columns = columnValues2)
print(df2)
        foo                 bar          
          B         C         G         H
0  0.208822  0.762884  0.424412  0.583324
1  0.767560  0.884583  0.716843  0.329719
2  0.147991  0.424748  0.560599  0.828155
3  0.376050  0.436354  0.704379  0.406324

假设我要合并这些以得到:

          foo                                bar                
            A           B          C           D           E           F           G           H
0    0.833444    0.208822   0.762884    0.354676    0.468294    0.173005    0.424412    0.583324
1    0.409730    0.767560   0.884583    0.275342    0.595433    0.322785    0.716843    0.329719
2    0.515161    0.147991   0.424748    0.340063    0.117509    0.491957    0.560599    0.828155
3    0.285594    0.376050   0.436354    0.970524    0.322902    0.628351    0.704379    0.406324

我试着通过以下方式合并:

pd.merge(df1.reset_index(), df2.reset_index(), on=df1.columns.levels[0], 
how='inner').set_index(df1.columns.levels[0])

很遗憾,我收到以下错误消息:

ValueError: The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all()

如何合并2个多索引数据帧? `你知道吗


Tags: columns数据fromindexfoobarziplist
2条回答

这并不是真正的“合并”,因为您并没有真正匹配数据帧之间的值,您只是并排添加一些列。所以pd.concat做你需要的:

combined = pd.concat([df1, df2], axis=1)
combined.sort_index(axis=1, inplace=True)

combined
Out[13]: 
        bar                                               foo            \
          D         E         F         G         H         A         B   
0  0.915879  0.712345  0.460795  0.529782  0.161578  0.803505  0.133896   
1  0.234319  0.317113  0.477687  0.525108  0.495104  0.107596  0.374732   
2  0.149397  0.244950  0.866735  0.501562  0.758321  0.508689  0.635703   
3  0.330018  0.204695  0.598899  0.522993  0.306496  0.936768  0.638874   


          C  
0  0.614592  
1  0.824297  
2  0.482161  
3  0.792035   

更新:动态选择列:

In [57]: join = df1.join(df2)

In [58]: cols = join.columns.get_level_values(0).unique()

In [59]: cols
Out[59]: array(['foo', 'bar'], dtype=object)

In [60]: join = join[cols]

In [61]: join
Out[61]:
        foo                           bar                                \
          A         B         C         D         E         F         G
0  0.176934  0.694937  0.947164  0.510407  0.085626  0.162183  0.382840
1  0.973283  0.743907  0.886495  0.028961  0.740759  0.330742  0.961932
2  0.898224  0.966278  0.131551  0.517563  0.026104  0.624047  0.848640
3  0.713660  0.704461  0.419997  0.718130  0.252294  0.336838  0.016916


          H
0  0.929695
1  0.444762
2  0.338168
3  0.635817

joined = df1.join(df2)[['foo','bar']]

说明:

您可以先加入您的DF:

In [47]: join = df1.join(df2)

In [48]: join
Out[48]:
        foo       bar                           foo                 bar  \
          A         D         E         F         B         C         G
0  0.176934  0.510407  0.085626  0.162183  0.694937  0.947164  0.382840
1  0.973283  0.028961  0.740759  0.330742  0.743907  0.886495  0.961932
2  0.898224  0.517563  0.026104  0.624047  0.966278  0.131551  0.848640
3  0.713660  0.718130  0.252294  0.336838  0.704461  0.419997  0.016916


          H
0  0.929695
1  0.444762
2  0.338168
3  0.635817

然后按所需顺序选择列(级别:0):

In [49]: join = join[['foo','bar']]

In [50]: join
Out[50]:
        foo                           bar                                \
          A         B         C         D         E         F         G
0  0.176934  0.694937  0.947164  0.510407  0.085626  0.162183  0.382840
1  0.973283  0.743907  0.886495  0.028961  0.740759  0.330742  0.961932
2  0.898224  0.966278  0.131551  0.517563  0.026104  0.624047  0.848640
3  0.713660  0.704461  0.419997  0.718130  0.252294  0.336838  0.016916


          H
0  0.929695
1  0.444762
2  0.338168
3  0.635817

相关问题 更多 >

    热门问题