多列vlookup的等价形式

2024-05-02 22:59:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我想返回多个用户列下每个用户的total\ u points列。你知道吗

更清楚地解释

{'secondBoxer1': {0: 'Cody',
  1: 'Billy',
  2: 'Jennifer',
  3: 'Franc',
  4: 'Mark'},
 'secondBoxer2': {0: 'Tamis',
  1: 'Danye',
  2: 'Leesa',
  3: 'Hector',
  4: 'Coy'},
 'secondBoxer3': {0: 'Davin',
  1: 'Delbert',
  2: 'Kanisca',
  3: 'Luis',
  4: 'nan'},
 'secondBoxer4': {0: 'Caro',
  1: 'John',
  2: 'nan',
  3: 'Jose',
  4: 'nan'},
 'secondBoxer5': {0: 'Caro',
  1: 'Ryan',
  2: 'nan',
  3: 'Jose',
  4: 'nan'},
 'secondBoxer6': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'Luis', 4: 'nan'}}

我有五个secondBoxer列,对于每个boxer列,我想将该列与另一个dataframe中的total points列合并,该dataframe对应于secondBoxer列下的名称

    name            total_points
0   Hector            50.000
1   John              48.000
2   Jose              30.000
3   Luis              31.875
4   Billy             27.500 

此实例中所需的输出为

secondBoxer1  total_points1  secondBoxer2  total_points2  ....
  Cody                          Tamis
  Billy          27.500         Danye
  Jeniffer                      Leesa
  Franc                         Hector        50.000
  Mark                          Coy

我尝试组合一个for循环遍历所有列(实际数据集有50多个secondBoxer列)并与第二个数据集合并以获得总的\u点,但没有成功。你知道吗

listen = ['secondBoxer1','secondBoxer2','secondBoxer3','secondBoxer4','secondBoxer5','secondBoxer6']
for i in listen:
    df=df.merge(df2[['name','total_points']],left_on=i,right_on='name')

但是,这将返回一个空数据集


Tags: 数据用户namenanpointscodytotaljose
2条回答

map然后concat的IIUC

out1=out.apply(lambda x : x.map(dict(zip(df.name,df.total_points))))
out1.columns='total_points'+out1.columns.str.strip('secondBoxer')
out=pd.concat([out,out1],axis=1)

这里我们需要argsort来重新排列数字

out=out.iloc[:,out.columns.str.extract('(\d+)')[0].argsort()]

out
Out[151]: 
  secondBoxer1  total_points1  ... secondBoxer6  total_points6
0         Cody            NaN  ...          nan            NaN
1        Billy           27.5  ...          nan            NaN
2     Jennifer            NaN  ...          nan            NaN
3        Franc            NaN  ...         Luis         31.875
4         Mark            NaN  ...          nan            NaN
[5 rows x 12 columns]

还有一种方法:

s=df2.set_index('name')['total_points']
final=df1.assign(**pd.DataFrame(np.where(df1.isin(s.index),df1.replace(s),np.nan)
                                ,columns=df1.columns.str[-1]).add_prefix('total_points'))
print(final[sorted(final.columns,key=lambda x: x[-1])])

  secondBoxer1 total_points1 secondBoxer2 total_points2 secondBoxer3  \
0         Cody           NaN        Tamis           NaN        Davin   
1        Billy          27.5        Danye           NaN      Delbert   
2     Jennifer           NaN        Leesa           NaN      Kanisca   
3        Franc           NaN       Hector            50         Luis   
4         Mark           NaN          Coy           NaN          nan   

  total_points3 secondBoxer4 total_points4 secondBoxer5 total_points5  \
0           NaN         Caro           NaN         Caro           NaN   
1           NaN         John            48         Ryan           NaN   
2           NaN          nan           NaN          nan           NaN   
3        31.875         Jose            30         Jose            30   
4           NaN          nan           NaN          nan           NaN   

  secondBoxer6 total_points6  
0          nan           NaN  
1          nan           NaN  
2          nan           NaN  
3         Luis        31.875  
4          nan           NaN 

相关问题 更多 >