在Python中合并数据帧,列和时差值为1分钟

2024-09-27 23:22:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手,希望得到帮助合并以下两个DataFame:

a)在数值字段df1_UL_GTP_TEID_0_intdf2_TEID_UL_int中应具有相同的值

b)df1START_TIME_roundoff和df2TS_START_roundoff之间的差异应为1分钟

df1
START_TIME_roundoff UL_GTP_TEID_0_int   TIMSI/MSIN
46  2020-03-10 12:00:00 1196907781  3.240371e+09
190 2020-03-10 12:01:00 1147678181  3.244522e+09
308 2020-03-10 12:05:00 1147678181  3.244522e+09
496 2020-03-10 12:07:00 1691830165  3.252351e+09
632 2020-03-10 12:12:00 1809929829  3.237458e+09

df2
S_START_roundoff    TEID_UL_int DIR
1   2020-10-03 09:59:00 1973380469  1
2   2020-10-03 10:00:00 2041336357  2
4   2020-10-03 12:06:00 1147678181  12
5   2020-10-03 09:57:00 1295205669  1
6   2020-10-03 12:12:00 1809929829  13

<<< Expected OUTPUT >>> 
row 308 of df_1 should merge with row 4 of df_2
and row 632 of df_1 should merge with row 6 of df_2

df_1 dataframedf_2 dataframe

逻辑:

If absolute value of |df1.START_TIME_roundoff - df2.TS_START_roundoff| <= 1 then 
df_new = pd.merge(df_1, df_2, how='inner', left_on='UL_GTP_TEID_0_int', right_on='TEID_UL_int')

提前感谢,


Tags: ofdftimewithmergeulstartint
1条回答
网友
1楼 · 发布于 2024-09-27 23:22:51

使用以下数据帧:

df1= pd.DataFrame({'A': [1, 2, 3],
                   'B': ['2019-01-01 10:00:00', '2019-01-02 12:20:00', '2019-01-01 10:00:00'],
                   'C': ['ID1', 'ID2', 'ID3']})
df2= pd.DataFrame({'D': ['D1', 'D2', 'D3'],
                   'E': ['2019-01-01 10:00:59', '2019-01-02 12:21:20', '2019-01-01 09:59:30'],
                   'F': ['ID1', 'ID2', 'ID3']})

如果我理解您正确连接两个数据集的条件,下面的代码似乎可以完成这项工作:

df = pd.merge(df1, df2, left_on='C', right_on='F', how='inner')
df['B'] = pd.to_datetime(df['B']) # Ensure it's datetime
df['E'] = pd.to_datetime(df['E']) # Ensure it's datetime
df['delta']=abs(df['B'] - df['E'])/np.timedelta64(1,'m') # Find the abs diff in minutes
df.query("delta<1").drop(columns='delta')

输出是包含ID1和ID3的合并数据帧

这可能不是一个完美的解决方案,因为经验丰富的人可能只需要一行代码就可以做到

相关问题 更多 >

    热门问题