复制错误

2024-09-29 23:30:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我有三个数据帧:

maindf  = pd.DataFrame({'Risk':['AB','AC','AD'],'amnt':[100,200,300]})

maindf
Out[4]: 
  Risk  amnt
0   AB   100
1   AC   200
2   AD   300

disc = pd.DataFrame({'Risk':['AB','AB','AB','AC','AC','AD'], 'discPerc':[0.4,0.5,0.1,0.5,0.5,1]})

disc
Out[7]: 
  Risk  discPerc
0   AB       0.4
1   AB       0.5
2   AB       0.1
3   AC       0.5
4   AC       0.5
5   AD       1.0

ops = pd.DataFrame({'Risk':['AB','AB','AC','AC','AD','AD'], 'opsPerc':[0.5,0.5,0.4,0.6,0.2,0.8]})

ops
Out[9]: 
  Risk  opsPerc
0   AB      0.5
1   AB      0.5
2   AC      0.4
3   AC      0.6
4   AD      0.2
5   AD      0.8

我想把数据帧连接到maindf,这样如果我需要按列“Risk”分组,我会将discPerc和opsPerc相加为1(因为它们在disc/ops数据帧中)

简单的双左联接会导致:

merged = pd.merge(maindf,disc,on='Risk',how='left')

merged = pd.merge(merged,ops, on = 'Risk', how = 'left')

merged
Out[19]: 
   Risk  amnt  discPerc  opsPerc
0    AB   100       0.4      0.5
1    AB   100       0.4      0.5
2    AB   100       0.5      0.5
3    AB   100       0.5      0.5
4    AB   100       0.1      0.5
5    AB   100       0.1      0.5
6    AC   200       0.5      0.4
7    AC   200       0.5      0.6
8    AC   200       0.5      0.4
9    AC   200       0.5      0.6
10   AD   300       1.0      0.2
11   AD   300       1.0      0.8

将其分组可得出:

merged.groupby('Risk').sum()
Out[20]: 
      amnt  discPerc  opsPerc
Risk                         
AB     600       2.0      3.0
AC     800       2.0      2.0
AD     600       2.0      1.0

相反,我希望合并的数据帧看起来像:

   Risk  amnt  discPerc  opsPerc
0    AB   100       0.4      nan
1    AB   100       0.5      nan
2    AB   100       0.1      nan
3    AB   100       nan      0.5
4    AB   100       nan      0.5
6    AC   200       0.5      nan
7    AC   200       0.5      nan
8    AC   200       nan      0.4
9    AC   200       nan      0.6
10   AD   300       1.0      nan
11   AD   300       nan      0.2
12   AD   300       nan      0.8

这样我可以把百分比加起来等于1。你知道吗


Tags: 数据abmergednanoutadacops
1条回答
网友
1楼 · 发布于 2024-09-29 23:30:09

可以合并discops,然后与原始数据帧合并:

pd.concat((disc, ops)).merge(maindf)
Out: 
   Risk  discPerc  opsPerc  amnt
0    AB       0.4      NaN   100
1    AB       0.5      NaN   100
2    AB       0.1      NaN   100
3    AB       NaN      0.5   100
4    AB       NaN      0.5   100
5    AC       0.5      NaN   200
6    AC       0.5      NaN   200
7    AC       NaN      0.4   200
8    AC       NaN      0.6   200
9    AD       1.0      NaN   300
10   AD       NaN      0.2   300
11   AD       NaN      0.8   300

相关问题 更多 >

    热门问题