在Python中合并两列上大小不同的数据帧

2024-10-06 12:19:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望在python中合并两个数据帧,df1和df2,在Site和Building的两个列上,使用不同的行数量,以获得df1中每个生成器值的“安全”值。下面是一个演示代码,虽然我已经在下面的示例中创建了dataframes(看起来很有效),但实际问题中每个表的数据都来自SQL查询,这使我相信合并由于数据类型而存在问题。你知道吗

import pandas as pd

df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Greece','Greece','Greece','Greece','Greece','Greece'],
        'Building' : ['X1','X1','X1','X1','X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X4','X4','X4','X4','X4',   'X4','X5','X5','X5','X5','X5','X5','X1','X1',   'X1','X1',  'X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X1', 'X1','X1',  'X1','X1',  'X1'],
        'Generator' : ['DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE',    'NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4']}

df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])


df15 = {'Building' : ['X1','X2','X3','X4','X5','X1','X2','X3','X1'],
        'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Greece'],
        'Safe' : [1,    1,  1,  1,  1,  0,  1,  1,  0]}


df2 = pd.DataFrame(df15, columns = ['Site', 'Building', 'Safe'])


df3 = df1.merge(df2, how = 'left', on = ['Site', 'Building'], indicator = True)

Desired outcome

我还试着根据pandas - Merging on string columns not working (bug?)将每个的数据类型更改为字符串

df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')
df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')

还有下面提到的检查编码的步骤,但似乎都与ie匹配

df1['Building'] = df1['Building'].str.encode('UTF-8')
df1['Site'] = df1['Site'].str.encode('UTF-8')

数据类型:

df2.datatypes:

    Site                           object
    Building                       object
    Safe                           object
    dtype: object


    df1.datatypes:


    Building      object
    Site          object
    Generator     object
    dtype:        object

我尝试了以下代码:

df3 = df1.merge(df2, left_on = ['Site', 'Building'], right_on = ['Site', 'Building'], how = 'left', indicator = 'indicator')

或:

df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'left', indicator = 'indicator')

但结果只有左边的数据,即结果1。你知道吗

Undesired outcome 1

我尝试了下面的outer join,结果是2:

df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'outer', indicator = 'indicator')

Undesired Outcome 2

对我对熊猫的无知表示歉意。你知道吗


Tags: objectsitededf1x1x2buildingx3
1条回答
网友
1楼 · 发布于 2024-10-06 12:19:19

我注意到你共享的代码中有一个小错误。你知道吗

df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])应该是df1 = pd.DataFrame(df, columns = ['Site', 'Building', 'Generator'])。应该传递给的变量pd.数据帧应该是df而不是df1。你知道吗

在这一步之后,只需使用dataframes进行合并就可以得到所需的结果

pd.merge(df1,df2, on=['Building','Site'])

输出如下所示:

       Site Building Generator  Safe
0   Belgium       X1        DE     1
1   Belgium       X1       NDE     1
2   Belgium       X1      GBX1     1
3   Belgium       X1      GBX2     1
4   Belgium       X1      GBX3     1
5   Belgium       X1      GBX4     1
6   Belgium       X2        DE     1
7   Belgium       X2       NDE     1
8   Belgium       X2      GBX1     1
9   Belgium       X2      GBX2     1
10  Belgium       X2      GBX3     1
11  Belgium       X2      GBX4     1
12  Belgium       X3        DE     1

相关问题 更多 >