在一列或另一列上合并

2024-05-20 17:09:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我想合并2个数据帧:

df1:

                                              cik0        cik1        cik2  
'MKTG, INC.'                            0001019056        None        None   
1 800 FLOWERS COM INC                   0001104659  0001437749        None   
11 GOOD ENERGY INC                      0000930413        None        None   
1347 CAPITAL CORP                       0001144204        None        None   
1347 PROPERTY INSURANCE HOLDINGS, INC.  0001387131        None        None 

df2:

              cik Ticker
0      0001144204   AABB
1      0001019056      A
2      0001387131   AABC
3      0001437749     AA
4      0000930413  AAACU

预期结果:

                                              cik0        cik1  cik2 ticker
'MKTG, INC.'                            0001019056        None  None      A
1 800 FLOWERS COM INC                   0001104659  0001437749  None     AA
11 GOOD ENERGY INC                      0000930413        None  None  AAACU
1347 CAPITAL CORP                       0001144204        None  None   AABB
1347 PROPERTY INSURANCE HOLDINGS, INC.  0001387131        None  None   AABC

我想将cik0df2['cik']匹配, 如果它不起作用,我想看看cik1,依此类推。你知道吗

谢谢你的帮助!你知道吗


Tags: comnonepropertyenergyincgoodinsurancecorp
2条回答

可以将^{}^{}一起使用几次:

ticker_map = df2.set_index('cik')['Ticker']

df1['ticker'] = df1['cik0'].map(ticker_map)\
                           .fillna(df1['cik1'].map(ticker_map))\
                           .fillna(df1['cik2'].map(ticker_map))

然而,这有点乏味。您可以定义一个函数来迭代执行此操作:

def apply_map_on_cols(df, cols, mapper):
    s = df[cols[0]].map(mapper)
    for col in cols[1:]:
        s = s.fillna(df[col].map(mapper))
    return s

df1['ticker'] = df.pipe(apply_map_on_cols,
                        cols=[f'cik{i}' for i in range(3)],
                        mapper=df2.set_index('cik')['Ticker'])

另一种可能性是使用pd.merge合并数据帧:

dfs = []  # list to temporarily store partially merged dfs
df1.reset_index(inplace=True)  # reset index to maintain correct index order
for col in df1:  # iterate over columns
    # append partially merged columns
    dfs.append(pd.merge(df1, df2, left_on=col, right_on=['cik']))
# concat all partial results:
df_result = pd.concat(dfs, axis=0)
df_result.set_index('index', inplace=True)  # set old index
df_result.drop('cik', axis=1, inplace=True)  # drop 'cik' row

df1.shape[0] >> df1.shape[1]>>意味着更大)时,这应该比任何使用map的方法快好几倍,这对于大多数实际用例数据集应该是正确的。你知道吗

相关问题 更多 >