检查一个数据帧是否存在于另一个数据帧中

Time ID_1 ID_2 2020-02-25 09:24:14 140209 81625000 2020-02-25 09:24:14 140216 91625000 2020-02-25 09:24:18 140219 80250000 2020-02-25 09:24:18 140221 90250000 25/02/2020 09:42:02 143982 39075000

ID_1 ID_2 Time Match? 140209 81625000 25/02/2020 09:24:14 no_match 143983 44075000 25/02/2020 09:42:02 no_match 143982 39075000 25/02/2020 09:42:02 match 143984 39075000 25/02/2020 09:42:02 no_match

Overall_1 = pds.merge(Overall, df2, on=….., how='left', indicator= 'Exist') Overall_1.drop([...], inplace = True, axis =1 ) Overall_1['Exist']= np.where((Overall_1.Exist =='both') & (Overall_1.Match? == match), 'yes', 'no')

Time ID_1 ID_2 Exist 2020-02-25 09:24:14 140209 81625000 No 2020-02-25 09:24:14 140216 91625000 NaN 2020-02-25 09:24:18 140219 80250000 NaN 2020-02-25 09:24:18 140221 90250000 Nan 25/02/2020 09:42:02 143982 39075000 Yes

2条回答

网友

1楼 · 编辑于 2024-06-03 15:10:14

您可以尝试： df_diff=pd.concat（[总体，df2]）。删除重复项（keep=False）

网友

2楼 · 编辑于 2024-06-03 15:10:14

使用merge和np.select.

import numpy as np
#df1 = Overall
df3 = pd.merge(df1,df2,on=['ID_1','ID_2','Time'],how='left',indicator='Exists')


col1 = df3['Match?']
col2 = df3['Exists']

conditions = [(col1.eq('match') & (col2.eq('both'))),
              (col1.eq('no_match') & (col2.eq('both')))
             ]

choices = ['yes','no']

df3['Exists'] = np.select(conditions,choices,default=np.nan)

print(df3.drop('Match?',axis=1))


                 Time    ID_1      ID_2 Exists
0 2020-02-25 09:24:14  140209  81625000     no
1 2020-02-25 09:24:14  140216  91625000    nan
2 2020-02-25 09:24:18  140219  80250000    nan
3 2020-02-25 09:24:18  140221  90250000    nan
4 2020-02-25 09:42:02  143982  39075000    yes

或者更简单地使用replacedict和.merge

df3 = pd.merge(df1,df2,on=['ID_1','ID_2','Time'],how='left')\
                                      .replace({'no_match' : 'no', 
                                                'match' : 'yes'})\
                                      .rename(columns={'Match?' : 'Exists'})

print(df3)

                 Time    ID_1      ID_2 Exists
0 2020-02-25 09:24:14  140209  81625000     no
1 2020-02-25 09:24:14  140216  91625000    NaN
2 2020-02-25 09:24:18  140219  80250000    NaN
3 2020-02-25 09:24:18  140221  90250000    NaN
4 2020-02-25 09:42:02  143982  39075000    yes

相关问题更多 >

编程相关推荐

热门问题

热门文章