基于第三个值的新_目标列

2024-05-02 18:02:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧:

source    target              
jan       feb                               
mar       apr                 
jun                       
feb       aug                                            
apr       jul                                            
oct       dec                     
aug       nov       
dec       may                               

输出数据帧将是:

source    target    new_target              
jan       feb       aug                        
mar       apr       jul                  
jun                              
feb       aug       nov                                     
apr       jul       jul                                           
oct       dec       may              
aug       nov       nov
dec       may       may

因此new_target列将有第三个值:即(源和目标jan->feb->aug->nov之间的跟踪,因为aug是第三个值,所以它是new_target列中的输出)

编辑:

source    target    new_target              
jan       feb       aug                        
mar       apr       jul                  
jun                              
feb       aug       nov                                     
apr       jul                                                  
oct       dec       may              
aug       nov       
dec       may       

2条回答

^{}^{}创建的Series一起使用,然后使用^{}

s = df.set_index(['source'])['target']
#if possible duplicates in source
#s = df.drop_duplicates('source').set_index(['source'])['target']
df['new_target'] = df['target'].map(s).fillna(df['target'])
print (df)
  source target new_target
0    jan    feb        aug
1    mar    apr        jul
2    jun                  
3    feb    aug        nov
4    apr    jul        jul
5    oct    dec        may
6    aug    nov        nov
7    dec    may        may

编辑:

s = df.set_index(['source'])['target']
#if possible duplicates in source
#s = df.drop_duplicates('source').set_index(['source'])['target']
df['new_target'] = df['target'].map(s)
print (df)
  source target new_target
0    jan    feb        aug
1    mar    apr        jul
2    jun               NaN
3    feb    aug        nov
4    apr    jul        NaN
5    oct    dec        may
6    aug    nov        NaN
7    dec    may        NaN
d = df.dropna().set_index('source').target.to_dict()
df['new_target'] = df.target.apply(lambda x: d.get(x,x))

    source  target  new_target
0   jan     feb     aug
1   mar     apr     jul
2   jun 
3   feb     aug     nov
4   apr     jul     jul
5   oct     dec     may
6   aug     nov     nov
7   dec     may     may

相关问题 更多 >