合并数据帧问题

2024-06-26 01:41:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧

第一个是

+------------------------------------------+
|       ID             CustomerType Choice |
+------------------------------------------+
| 0    1.0               Durability    OEM |
| 1    2.0                    Price    OEM |
| 2    3.0               Durability    OEM |
| 3    4.0               Durability    OEM |
| 4    5.0               Durability    OEM |
| 5    6.0  ManufacturerCredibility    OEM |
| 6    7.0                 Warranty    OEM |
| 7    8.0  ManufacturerCredibility    OEM |
| 8    9.0               Durability    OEM |
| 9   10.0                    Price    OEM |
| 10  11.0               Durability    TPN |
| 11  12.0                 Warranty    OEM |
| 12  13.0               Durability    TPN |
+------------------------------------------+

第二个是

 --------------------------------------------------------+
|        Price  Durability  Warranty  Manufacture   Type |
+--------------------------------------------------------+
| OEM     1.00         4.0       4.0          4.0    OEM |
| TPN     0.80         4.0       1.0          1.0    TPN |
| Reman   0.55         4.0       0.5          1.0  Reman |
| Reuse   0.45         2.5       0.0          1.0  Reuse |
+--------------------------------------------------------+

我需要通过使用第一个数据帧中的“Choice”和第二个数据帧中的“Type”来连接这两个数据帧

目前我正在使用

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')

有这样尴尬的结果

+------------------------------------------------------------------------------+
|     Price  Durability  Warranty  Manufacture             CustomerType Choice |
+------------------------------------------------------------------------------+
| 0     1.0         4.0       4.0          4.0               Durability    OEM |
| 1     1.0         4.0       4.0          4.0                    Price    OEM |
| 2     1.0         4.0       4.0          4.0               Durability    OEM |
| 3     1.0         4.0       4.0          4.0               Durability    OEM |
| 4     1.0         4.0       4.0          4.0               Durability    OEM |
| 5     1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
| 6     1.0         4.0       4.0          4.0                 Warranty    OEM |
| 7     1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
| 8     1.0         4.0       4.0          4.0               Durability    OEM |
| 9     1.0         4.0       4.0          4.0                    Price    OEM |
| 10    1.0         4.0       4.0          4.0                 Warranty    OEM |
| 11    1.0         4.0       4.0          4.0                    Price    OEM |
| 12    1.0         4.0       4.0          4.0                 Warranty    OEM |
| 13    1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
+------------------------------------------------------------------------------+

从结果表中我们可以看到,所有行都包含来自第二个数据帧的OEM数据。我做错什么了


Tags: 数据ontypeleftpricechoicewarrantyreuse
3条回答

我从您的示例数据中得到了一点不同的输出,最后一行和结尾的3.rd行合并正确,而且left join不通过合并列进行排序:

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')
print (data)
      ID             CustomerType Choice  ...   Warranty  Manufacture  Type
0    1.0               Durability    OEM  ...        4.0          4.0   OEM
1    2.0                    Price    OEM  ...        4.0          4.0   OEM
2    3.0               Durability    OEM  ...        4.0          4.0   OEM
3    4.0               Durability    OEM  ...        4.0          4.0   OEM
4    5.0               Durability    OEM  ...        4.0          4.0   OEM
5    6.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
6    7.0                 Warranty    OEM  ...        4.0          4.0   OEM
7    8.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
8    9.0               Durability    OEM  ...        4.0          4.0   OEM
9   10.0                    Price    OEM  ...        4.0          4.0   OEM
10  11.0               Durability    TPN  ...        1.0          1.0   TPN
11  12.0                 Warranty    OEM  ...        4.0          4.0   OEM
12  13.0               Durability    TPN  ...        1.0          1.0   TPN

[13 rows x 8 columns]

如果排序似乎使用inner join(默认值):

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type')
#same as
#data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='inner')
print (data)
      ID             CustomerType Choice  ...   Warranty  Manufacture  Type
0    1.0               Durability    OEM  ...        4.0          4.0   OEM
1    2.0                    Price    OEM  ...        4.0          4.0   OEM
2    3.0               Durability    OEM  ...        4.0          4.0   OEM
3    4.0               Durability    OEM  ...        4.0          4.0   OEM
4    5.0               Durability    OEM  ...        4.0          4.0   OEM
5    6.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
6    7.0                 Warranty    OEM  ...        4.0          4.0   OEM
7    8.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
8    9.0               Durability    OEM  ...        4.0          4.0   OEM
9   10.0                    Price    OEM  ...        4.0          4.0   OEM
10  12.0                 Warranty    OEM  ...        4.0          4.0   OEM
11  11.0               Durability    TPN  ...        1.0          1.0   TPN
12  13.0               Durability    TPN  ...        1.0          1.0   TPN

[13 rows x 8 columns]

相关问题 更多 >