处理多个合并为空的Pandas

2024-09-02 23:31:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有三个数据帧:

框架1:

import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)
ip

框架2:

cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)
cusipmap

框架3:

isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)
isinmap

我想在第1帧中添加一列,通过首先在第2帧上连接CUSIP,然后在第3帧上使用ISIN,保持第1帧中的行数不变(左连接)。但是,我只想在第2帧连接中生成NaN时,在第3帧上执行ISIN连接。因此,我要寻找的输出类似于:

op= {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN],'ID':['a1','b1','c1']}
op = pd.DataFrame(data=op)
op

pd.merge是否具有执行上述操作的功能


Tags: import框架iddataframedataasnpnan
2条回答

您可以使用^{}创建新的Series,然后使用^{}^{}替换缺少的值:

s1 = ip['instrument'].map(cusipmap.set_index('CUSIP')['ID'])
s2 = ip['instrument'].map(isinmap.set_index('ISIN')['ID'])
ip['Id'] = s1.fillna(s2)
print (ip)
  instrument CUSIP ISIN  Id
0          a   US1  NaN  a1
1          b   NaN  EU1  b1
2          c   US2  NaN  c1

使用merge的解决方案:

s1 = ip.merge(cusipmap.rename(columns={'CUSIP':'instrument'}),on='instrument',how='left')['ID']
s2 = ip.merge(isinmap.rename(columns={'ISIN':'instrument'}),on='instrument',how='left')['ID']

ip['Id'] = s1.combine_first(s2)
print (ip)
  instrument CUSIP ISIN  Id
0          a   US1  NaN  a1
1          b   NaN  EU1  b1
2          c   US2  NaN  c1

我的方法是合并两个数据帧,然后使用.loc构建所需的列。示例如下:

import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)

cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)

isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)


df = (ip
      .merge(cusipmap.rename(columns={"CUSIP": "instrument"}), how="left", on="instrument")
      .merge(isinmap.rename(columns={"ISIN": "instrument"}), how="left", on="instrument", suffixes=["CUSIP", "ISIN"]))

df["ID"] = df.IDCUSIP
df.loc[df.ID.isnull(), "ID"] = df.loc[df.ID.isnull(), "IDISIN"]
df = df.drop(["IDCUSIP", "IDISIN"], axis=1)

相关问题 更多 >