在Pandas中高效地合并列和行两个数据帧

2024-06-28 10:57:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个详细的Transaction data数据框架,如下所示

 df_col = pd.DataFrame({'SQ':[1,1,2],
                 'City':['A','A','B'],
                 'Date':['7-1-2020','7-2-2020','7-1-2020'],
                 'Loc 1':[40,21,27],
                 'Loc 2':[37,40,14],
                 'Loc 3':[49,38,36],
                 'Loc 4':[20,14,18],
                 'Loc 5':[48,27,36]})


+----+------+----------+-------+-------+-------+-------+-------+
| SQ | City |   Date   | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-1-2020 |   40  |   37  |   49  |   20  |   48  |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-2-2020 |   21  |   40  |   38  |   14  |   27  |
+----+------+----------+-------+-------+-------+-------+-------+
|  2 |   B  | 7-1-2020 |   27  |   14  |   36  |   18  |   36  |
+----+------+----------+-------+-------+-------+-------+-------+

另外,我还有一个单独的Location data数据帧,如下所示

+------------+--------------+
| LocationNo | LocationType |
+------------+--------------+
|    Loc 1   |    Class A   |
+------------+--------------+
|    Loc 2   |    Class A   |
+------------+--------------+
|    Loc 3   |    Class B   |
+------------+--------------+
|    Loc 4   |    Class C   |
+------------+--------------+
|    Loc 5   |    Class C   |
+------------+--------------+

df_row = pd.DataFrame({'LocationNo':['Loc 1','Loc 2','Loc 3','Loc 4','Loc 5'],
             'LocationType':['Class A', 'Class A', 'Class B', 'Class C', 'Class C']
             })

现在,我的任务是将df_col中的mergedf_coldf_row中的行,并求和这些值。 合并Vertical to Horizontal,即Columns to Rows

我想要的输出如下

+----+------+----------+---------+---------+---------+
| SQ | City |   Date   | Class A | Class B | Class C |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-1-2020 |    77   |    49   |    68   |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-2-2020 |    61   |    38   |    41   |
+----+------+----------+---------+---------+---------+
|  2 |   B  | 7-1-2020 |    41   |    36   |    54   |
+----+------+----------+---------+---------+---------+

我写了下面的代码

# setting the index
df_col.set_index(['SQ','City','Date'], inplace=True)
df_row.set_index('LocationNo', inplace=True)


 # I tried to merge vertically columns to columns. Hence, transpose the df_col.T
df_final = df_col.T.merge(df_row, left_index=True, right_index=True, how='left').groupby('LocationType').agg('sum').T

上述代码输出的结果如下所示

+------------------+-----------+-----------+-----------+
|   LocationType   | Class   A | Class   B | Class   C |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     77    |     49    |     68    |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     61    |     38    |     41    |
+------------------+-----------+-----------+-----------+
| (2, B, 7-2-2020) |     41    |     36    |     54    |
+------------------+-----------+-----------+-----------+

答案是正确的。但是,前3列合并为一列。我需要将其分成类似于上面提到的desired output的单独列

我应该如何解决这个问题?什么是有效的方法


Tags: totruecitydfdatadateindexsq
2条回答

让我们从{}中{}类似{}的列,然后基于{}中的{}将这些列映射到{},最后使用{}在这些映射列上沿{}和{}映射数据帧{}:

d = df_col.filter(like='Loc')
g = d.columns.map(df_row.set_index('LocationNo')['LocationType'])
out = df_col[['SQ','City','Date']].join(d.groupby(g, axis=1).sum())

   SQ City      Date  Class A  Class B  Class C
0   1    A  7-1-2020       77       49       68
1   1    A  7-1-2020       61       38       41
2   2    B  7-2-2020       41       36       54

使用meltmergegroupby的一种方法:

print (df_col.melt(id_vars=["SQ", "City", "Date"], var_name="LocationNo")
             .merge(df_row, how="left", on="LocationNo")
             .groupby(["SQ", "City","LocationType", "Date"])["value"].sum()
             .unstack("LocationType"))

LocationType      Class A  Class B  Class C
SQ City Date                               
1  A    7-1-2020       77       49       68
        7-2-2020       61       38       41
2  B    7-2-2020       41       36       54

相关问题 更多 >