Pandas:如何处理数据帧中特定列的重叠时间?

2024-09-30 06:20:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在数据帧上处理重叠的时间范围,我发现很难继续。我有一个数据框,其值如下所示。我的数据框中有大约400000行。是否有人可以帮助我使用什么函数来获得结果(对于3个用例)

事先非常感谢你的帮助

我有4个用例:

1。案例1: 报警1没有结束时间,但由于报警1、2和3的时间重叠,我需要三个时间中最低的开始时间和最大的结束时间(在本cas中,它没有结束)

   |------ALARM1--------- (no end)
    |----ALARM2-----|
|-------ALARM3--|      

<--------------------------...


    

   ELEMENT                                      TEXT                                       ID         START                       END

   OLT5134-LT4-PON0-ONT25                         ALARM1                                    5        2021-01-19 05:00:00                 NaT
   OLT5134-LT4-PON0-ONT25                         ALARM2                                 772874243   2021-01-20 06:00:00     2021-01-20 08:00:00
   OLT5134-LT4-PON0-ONT25                         ALARM3                                 772874243   2021-01-20 04:00:19     2021-01-20 10:03:16
   
   Result 
   OLT5134-LT4-PON0-ONT25                         ALARM3                                    5        2021-01-19 04:00:00                 NaT
    
    

2。案例2:Alarm2完全包含在Alarm1中

|-----ALARM1----|
   |---ALARM2-| 

<--------------->   

   ELEMENT                                      TEXT                                       ID         START                       END

   OLT5134-LT4-PON0-ONT26                         ALARM1                                    5        2021-01-19 05:00:00     2021-01-20 10:00:00
   OLT5134-LT4-PON0-ONT26                         ALARM2                                 772874243   2021-01-20 06:00:00     2021-01-20 08:00:00

   
   Result 
   OLT5134-LT4-PON0-ONT26                         ALARM1                                    5        2021-01-19 05:00:00     2021-01-20 10:00:00
    

3。案例3:Alarm2和Alarm1有一个共同的重叠时间

|-----ALARM1----|
           |---ALARM2-| 
<--------------------->

   ELEMENT                                      TEXT                                       ID         START                       END

   OLT5134-LT4-PON0-ONT27                         ALARM1                                    5        2021-01-17 05:00:00     2021-01-19 10:00:00
   OLT5134-LT4-PON0-ONT27                         ALARM2                                 772874243   2021-01-18 06:00:00     2021-01-20 08:00:00

   
   Result 
   OLT5134-LT4-PON0-ONT27                         ALARM1                                    5        2021-01-17 05:00:00     2021-01-20 08:00:00

情况4:Alarm2恰好在Alarm1的末尾开始

这里我只需要得到一行,它包含Alarm1的开始和Alarm2的结束

我试图使用以下代码,但它没有帮助,我不知道如何解决这个问题

df['newid']=(df['START']-df['END'].shift()).dt.total_seconds().gt(0).cumsum()

print(df.to_string(index=False))
grouped = df.groupby(df['ELEMENT'])
print(grouped.get_group('OLT5134-LT4-PON0-ONT29').to_string(index=False))
print(grouped.get_group('OLT5134-LT4-PON0-ONT25').to_string(index=False))

我是个新手,需要帮助

                ELEMENT                                     TEXT         ID               START                 END  newid
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 18:07:22                 NaT  10975
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 19:07:42                 NaT  10975
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 19:52:57                 NaT  10975
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 19:52:57 2021-01-19 21:38:23  10975
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 23:09:02                 NaT  10976
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 23:09:02 2021-01-19 23:23:58  10976
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 08:50:54 2021-01-20 09:05:51  10977
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 10:06:18 2021-01-20 10:36:21  10978
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 11:17:29 2021-01-20 12:02:37  10979
                ELEMENT                                     TEXT         ID               START                 END  newid
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 20:59:06 2021-01-19 21:29:08  10971
 OLT5134-LT4-PON0-ONT25                            ACCES_DEGRADE          5 2021-01-19 23:24:53 2021-01-22 03:14:53  10972
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 06:47:20 2021-01-20 07:02:17  10972
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 09:48:19 2021-01-20 10:03:16  10973
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 10:18:29 2021-01-20 11:18:41  10974
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 11:33:55 2021-01-20 11:48:52  10975

当我有以下输入时:

                ELEMENT                                     TEXT         ID               START                 END
 OLT2227-LT3-PON0-ONT27  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 22:00:00 2021-01-19 22:30:00
 OLT2227-LT3-PON0-ONT27  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 22:30:00 2021-01-19 23:44:12
 OLT2227-LT3-PON0-ONT27  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 23:30:00 2021-01-19 23:46:12
 OLT2227-LT3-PON0-ONT30  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 19:30:00                 NaT
 OLT2227-LT3-PON0-ONT30  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 20:00:00 2021-01-19 23:00:00
 OLT2227-LT3-PON0-ONT31  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 18:00:00 2021-01-19 19:00:00
 OLT2227-LT3-PON0-ONT31  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 20:30:00 2021-01-19 23:00:00

对于元素OLT2227-LT3-PON0-ONT31,它必须打印两行而不是一行,因为没有重叠


Tags: ofthesignalsdioccursalarm1degradelt4
1条回答
网友
1楼 · 发布于 2024-09-30 06:20:17

我知道可能会有比下面更简单和最优的解决方案。您可以将^{}^{}一起使用,但是如果您有大量数据,那么下面的操作可能会执行得很慢

import pandas as pd
from io import StringIO
import numpy as np

# Data preprocessing(ignore)
data = StringIO('''
   OLT5134-LT4-PON0-ONT25,                         ALARM1,                                    5,        2021-01-19 05:00:00,     NaT       
   OLT5134-LT4-PON0-ONT25,                         ALARM2,                                 772874243,   2021-01-20 06:00:00,     2021-01-20 08:00:00
   OLT5134-LT4-PON0-ONT25,                         ALARM3,                                 772874243,   2021-01-20 04:00:19,     2021-01-20 10:03:16
   OLT5134-LT4-PON0-ONT26,                         ALARM1,                                    5,        2021-01-19 05:00:00,     2021-01-20 10:00:00
   OLT5134-LT4-PON0-ONT26,                         ALARM2,                                 772874243,   2021-01-20 06:00:00,     2021-01-20 08:00:00
   OLT5134-LT4-PON0-ONT27,                         ALARM1,                                    5,        2021-01-17 05:00:00,     2021-01-19 10:00:00
   OLT5134-LT4-PON0-ONT27,                         ALARM2,                                 772874243,   2021-01-18 06:00:00,     2021-01-20 08:00:00
''')

df = pd.read_csv(data,names=['ELEMENT','TEXT','ID','START','END'])
df['START'] = pd.to_datetime(df['START'])
df['END'] = pd.to_datetime(df['END'],errors ='coerce')


#See from below
gb = df.groupby(by ='ELEMENT')

def group_filter(group):
  
  min = group['START'].min()
  max = group['END'].max()
  id = group.loc[ group['START'] == min]['ID'].iloc[0]
  text = group.loc[group['START'] == min]['TEXT'].iloc[0]
  result = None

  #Put your conditions here....
  if np.isnat(group['END']).any():
    max = pd.to_datetime('NAT',errors='coerce')
    result = pd.Series([id,text,min,max])
  
  result = pd.Series([id,text,min,max])
  return result
  
output = gb.apply(group_filter).reset_index()
output.columns = df.columns
output

输出:

ELEMENT TEXT    ID  START   END
0   OLT5134-LT4-PON0-ONT25  5   ALARM1  2021-01-19 05:00:00 NaT
1   OLT5134-LT4-PON0-ONT26  5   ALARM1  2021-01-19 05:00:00 2021-01-20 10:00:00
2   OLT5134-LT4-PON0-ONT27  5   ALARM1  2021-01-17 05:00:00 2021-01-20 08:00:00

相关问题 更多 >

    热门问题