基于条件合并3个不同的数据帧

2024-10-01 04:51:07 发布

您现在位置:Python中文网/ 问答频道 /正文

如何组合三个数据帧,如下所示?你知道吗

前两个的主要关系必须基于ID1,因为它是两个数据帧之间的匹配关系。你知道吗

第三个数据帧,Address2必须匹配才能添加哈希

Df1型:

Name1   Name2  Name3   Address    ID1     ID2    Own
Matt    John1  Jill     878 home   1       0     Deal
Matt    John2  Jack     879 home   2       1     Dael

DF2型:

Name1   ID1   Address   Name4     Address2
Matt    1     878 home  face1     face\123
Matt    1     878 home  face2     face\345
Matt    1     878 home  face3     face\678    
Matt    2     879 home  head1     head\123
Matt    2     879 home  head2     head\345
Matt    2     879 home  head3     head\678

DF3公司:

Address2     Hash
face\123     abc123
face\345     cde321
face\678     efg123
head\123     123efg
head\345     efg321
head\678     acd321

我正在尝试将三个数据帧合并为一个,如下所示:

Name1   Name2   ID1 Address     Own    Name3    ID2 Name4   Address2    Hash
Matt    John1   1   878 home    Deal    Jill    0   face1   face\123    abc123
Matt    John1   1   878 home    Deal    Jill    0   face2   face\345    cde321
Matt    John1   1   878 home    Deal    Jill    0   face3   face\678    efg123
Matt    John2   2   879 home    Dael    Jack    1   head1   head\123    123efg
Matt    John2   2   879 home    Dael    Jack    1   head2   head\345    efg321
Matt    John2   2   879 home    Dael    Jack    1   head3   head\678    acd321

在df1和df2之间,键是Id1 在df2和df3之间,键是Address2

非常感谢你的帮助。你知道吗


Tags: 数据homeaddressmattheadfacejackname1
3条回答

我想这会管用的。merge函数在您想要加入的列上几乎为您完成了这项工作。你知道吗

import numpy as np
import pandas as pd

data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'],
                 ['Matt','John1','Jill','878 home','1','0','Deal'],
                 ['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']])

data2 = np.array([['Name1','ID1','Address','Name4','Address2'],
                 ['Matt', '1','878 home','face1',"face.123"],
                 ['Matt', '1','878 home', 'face2','face.345'],
                  ['Matt', '1','878 home', 'face3', 'face.678'],
                  ['Matt', '2', '879 home', 'head1', 'head.123'],
                  ['Matt', '2', '879 home', 'head2',  'head.345'],
                  ['Matt', '2', '879 home', 'head3', 'head.678']])
#print(data)
data3 = np.array([['Address2','Hash'],
                 ['face.123', 'abc123'],
                ['face.345','cde321'],
                 ['face.678', 'efg123'],
                ['head.123', '123efg'],
                ['head.345', 'efg321'],
                ['head.678', 'acd321']])

df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:])
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:])
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:])


Cdf= pd.merge(df1,df2, on='ID1', how='inner')
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner')
print(Ddf)

从所需的输出来看,除了默认情况下完成的列相交合并之外,似乎不需要任何规范。你知道吗

>>> df1.merge(df2).merge(df3)

  Name1  Name2 Name3  Address  ID1  ID2   Own  Name4  Address2    Hash
0  Matt  John1  Jill  878 home    1    0  Deal  face1  face\123  abc123
1  Matt  John1  Jill  878 home    1    0  Deal  face2  face\345  cde321
2  Matt  John1  Jill  878 home    1    0  Deal  face3  face\678  efg123
3  Matt  John2  Jack  879 home    2    1  Dael  head1  head\123  123efg
4  Matt  John2  Jack  879 home    2    1  Dael  head2  head\345  efg321
5  Matt  John2  Jack  879 home    2    1  Dael  head3  head\678  acd321

指定要合并的单列作为接受的答案实际上会导致问题,因为您将有后缀列。你知道吗

>>> df1.merge(df2, on="ID1", how="inner").merge(df3, on="Address2", how="inner")

  Name1_x  Name2 Name3 Address_x  ID1  ID2   Own Name1_y Address_y  Name4  \
0    Matt  John1  Jill   878home    1    0  Deal    Matt   878home  face1   
1    Matt  John1  Jill   878home    1    0  Deal    Matt   878home  face2   
2    Matt  John1  Jill   878home    1    0  Deal    Matt   878home  face3   
3    Matt  John2  Jack   879home    2    1  Dael    Matt   879home  head1   
4    Matt  John2  Jack   879home    2    1  Dael    Matt   879home  head2   
5    Matt  John2  Jack   879home    2    1  Dael    Matt   879home  head3   

   Address2    Hash  
0  face\123  abc123  
1  face\345  cde321  
2  face\678  efg123  
3  head\123  123efg  
4  head\345  efg321  
5  head\678  acd321 

看看merge函数,可以找到一些例子here。对于您的具体问题,请尝试以下方法:

combined_df = df1.merge(df2, on="Id1", how="inner").merge(df3, on="Adress2", how="inner")

相关问题 更多 >