我希望在python中合并两个数据帧,df1和df2,在Site和Building的两个列上,使用不同的行数量,以获得df1中每个生成器值的“安全”值。下面是一个演示代码,虽然我已经在下面的示例中创建了dataframes(看起来很有效),但实际问题中每个表的数据都来自SQL查询,这使我相信合并由于数据类型而存在问题。你知道吗
import pandas as pd
df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Greece','Greece','Greece','Greece','Greece','Greece'],
'Building' : ['X1','X1','X1','X1','X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X4','X4','X4','X4','X4', 'X4','X5','X5','X5','X5','X5','X5','X1','X1', 'X1','X1', 'X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X1', 'X1','X1', 'X1','X1', 'X1'],
'Generator' : ['DE','NDE', 'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE', 'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4', 'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE', 'NDE','GBX1','GBX2','GBX3','GBX4', 'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4']}
df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])
df15 = {'Building' : ['X1','X2','X3','X4','X5','X1','X2','X3','X1'],
'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Greece'],
'Safe' : [1, 1, 1, 1, 1, 0, 1, 1, 0]}
df2 = pd.DataFrame(df15, columns = ['Site', 'Building', 'Safe'])
df3 = df1.merge(df2, how = 'left', on = ['Site', 'Building'], indicator = True)
我还试着根据pandas - Merging on string columns not working (bug?)将每个的数据类型更改为字符串
df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')
df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')
还有下面提到的检查编码的步骤,但似乎都与ie匹配
df1['Building'] = df1['Building'].str.encode('UTF-8')
df1['Site'] = df1['Site'].str.encode('UTF-8')
数据类型:
df2.datatypes:
Site object
Building object
Safe object
dtype: object
df1.datatypes:
Building object
Site object
Generator object
dtype: object
我尝试了以下代码:
df3 = df1.merge(df2, left_on = ['Site', 'Building'], right_on = ['Site', 'Building'], how = 'left', indicator = 'indicator')
或:
df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'left', indicator = 'indicator')
但结果只有左边的数据,即结果1。你知道吗
我尝试了下面的outer join,结果是2:
df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'outer', indicator = 'indicator')
对我对熊猫的无知表示歉意。你知道吗
我注意到你共享的代码中有一个小错误。你知道吗
df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])
应该是df1 = pd.DataFrame(df, columns = ['Site', 'Building', 'Generator'])
。应该传递给的变量pd.数据帧应该是df
而不是df1
。你知道吗在这一步之后,只需使用dataframes进行合并就可以得到所需的结果
输出如下所示:
相关问题 更多 >
编程相关推荐