有没有办法在python中连接下面提到的两个数据表

2024-10-02 00:39:11 发布

您现在位置:Python中文网/ 问答频道 /正文

表1有80万个条目

     End_time       DAY  Exceed  C_time   stn  max    start_time
2019-12-26 12:29:34 PROD -41.9   21.1     501  21.1   2019-12-26 12:29:13 
2019-12-26 12:30:59 PROD -10.3   52.7     501  52.7   2019-12-26 12:30:07 
2019-12-26 12:32:36 PROD -35.8   27.2     503  27.2   2019-12-26 12:32:09 
2019-12-26 12:33:54 PROD -53.3   9.7      504  9.7    2019-12-26 12:33:45 
2019-12-26 12:35:04 PROD -24.6   38.4     505  38.4   2019-12-26 12:34:26 

表2有30万个条目

AlarmMessage  D_time Priority Station EquipID  Active Quality LineName   AlarmInTimeStamp
S501LH_B_RR_BT   2       1       501    2200505   True   192     BC1       2019-12-26 12:29:16.5608495 
SHT_B_S503_BEAM 21       1       503    2300249   True   192     BC1       2019-12-26 12:32:20.0634165  
S503LH_B_RR_T    2       1       503    2200505   True   192     BC1       2019-12-26 12:32:25.6494806 
SHT_B_S504_     21       1       504    2300256   True   192     BC1       2019-12-26 12:33:50.6719676 

如果表2“AlarmInTimeStamp”位于表1“开始时间”和“结束时间”之间,且两个表“站”相同,则应合并它们 这样我就可以最终计算出在时间戳和D_时间之和期间生成了多少报警

输出类似

     End_time       DAY  Exceed  C_time   stn  max    start_time           AlarmMessage     D_time
2019-12-26 12:29:34 PROD -41.9   21.1     501  21.1   2019-12-26 12:29:13  S501LH_B_RR_BT     2
2019-12-26 12:30:59 PROD -10.3   52.7     501  52.7   2019-12-26 12:30:07       -             -
2019-12-26 12:32:36 PROD -35.8   27.2     503  27.2   2019-12-26 12:32:09  SHT_B_S503_BEAM    21
                                                                           S503 LH_B_RR_T     2
2019-12-26 12:33:54 PROD -53.3   9.7      504  9.7    2019-12-26 12:33:45  SHT_B_S504         21   
2019-12-26 12:35:04 PROD -24.6   38.4     505  38.4   2019-12-26 12:34:26         -           -

Tags: truetime时间rr条目prodstartmax
1条回答
网友
1楼 · 发布于 2024-10-02 00:39:11

你可以用熊猫和一些矩阵乘法来解决这个问题

import pandas as pd
# Attempt #5: Use python and the pandas package
# create the pandas Data Frames (kind of like R data.frame)
myDataDF = pd.DataFrame({'Record':range(1,6), 'SomeValue':[10, 8, 14, 6, 2]})
linkTableDF = pd.DataFrame({'ValueOfInterest':['a', 'b', 'c'], 'LowerBound': [1, 4, 10],
'UpperBound':[3, 5, 16]})
# set the index of the linkTable (kind of like setting row names) 
linkTableDF = linkTableDF.set_index('ValueOfInterest')
# now apply a function to each row of the linkTable
# this function checks if any of the values in myData are between the upper
# and lower bound of a specific row thus returning 5 values (length of myData)
mask = linkTableDF.apply(lambda r: myDataDF.SomeValue.between(r['LowerBound'], 
r['UpperBound']), axis=1)
# mask is a 3 (length of linkTable) by 5 matrix of True/False values
# by transposing it we get the row names (the ValueOfInterest) as the column names
mask = mask.T
# we can then matrix multiply mask with its column names
myDataDF['ValueOfInterest'] = mask.dot(mask.columns)

在你的情况下,你可以使用

mask = table.apply(lambda r: table2.AlarmInTimeStamp.between(r['start_time'], 
r['End_time']), axis=1)

或者,您也可以对表使用SQL

资料来源:https://www.mango-solutions.com/in-between-a-rock-and-a-conditional-join/

相关问题 更多 >

    热门问题