如何在python中查找带条件的连续值

2024-10-08 18:30:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我在熊猫中有以下数据帧

 code      tank     date         time       no_operation_flag
 123       1        01-01-2019   00:00:00   1
 123       1        01-01-2019   00:30:00   1
 123       1        01-01-2019   01:00:00   0
 123       1        01-01-2019   01:30:00   1
 123       1        01-01-2019   02:00:00   1
 123       1        01-01-2019   02:30:00   1
 123       1        01-01-2019   03:00:00   1
 123       1        01-01-2019   03:30:00   1
 123       1        01-01-2019   04:00:00   1
 123       1        01-01-2019   05:00:00   1                   
 123       1        01-01-2019   14:00:00   1                     
 123       1        01-01-2019   14:30:00   1                  
 123       1        01-01-2019   15:00:00   1                  
 123       1        01-01-2019   15:30:00   1                  
 123       1        01-01-2019   16:00:00   1                    
 123       1        01-01-2019   16:30:00   1                  
 123       2        02-01-2019   00:00:00   1
 123       2        02-01-2019   00:30:00   0
 123       2        02-01-2019   01:00:00   0
 123       2        02-01-2019   01:30:00   0
 123       2        02-01-2019   02:00:00   1
 123       2        02-01-2019   02:30:00   1
 123       2        02-01-2019   03:00:00   1
 123       2        03-01-2019   03:30:00   1
 123       2        03-01-2019   04:00:00   1
 123       1        03-01-2019   14:00:00   1
 123       2        03-01-2019   15:00:00   1
 123       2        03-01-2019   00:30:00   1
 123       2        04-01-2019   11:00:00   1
 123       2        04-01-2019   11:30:00   0
 123       2        04-01-2019   12:00:00   1
 123       2        04-01-2019   13:30:00   1
 123       2        05-01-2019   03:00:00   1
 123       2        05-01-2019   03:30:00   1
 123       2        05-01-2019   04:00:00   1

我想做的是在油箱液位和日间液位标记连续的1超过5次,但时间应该是连续的(时间为半小时)。Dataframe已按储罐、日期和时间级别进行排序

我想要的数据帧是

 code       tank      date          time        no_operation_flag   final_flag
 123       1        01-01-2019   00:00:00       1                   0                   
 123       1        01-01-2019   00:30:00       1                   0
 123       1        01-01-2019   01:00:00       0                   0  
 123       1        01-01-2019   01:30:00       1                   1
 123       1        01-01-2019   02:00:00       1                   1  
 123       1        01-01-2019   02:30:00       1                   1
 123       1        01-01-2019   03:00:00       1                   1
 123       1        01-01-2019   03:30:00       1                   1
 123       1        01-01-2019   04:00:00       1                   1
 123       1        01-01-2019   05:00:00       1                   0
 123       1        01-01-2019   14:00:00       1                   1  
 123       1        01-01-2019   14:30:00       1                   1
 123       1        01-01-2019   15:00:00       1                   1
 123       1        01-01-2019   15:30:00       1                   1
 123       1        01-01-2019   16:00:00       1                   1  
 123       1        01-01-2019   16:30:00       1                   1
 123       2        02-01-2019   00:00:00       1                   0
 123       2        02-01-2019   00:30:00       0                   0    
 123       2        02-01-2019   01:00:00       0                   0
 123       2        02-01-2019   01:30:00       0                   0
 123       2        02-01-2019   02:00:00       1                   0
 123       2        02-01-2019   02:30:00       1                   0
 123       2        02-01-2019   03:00:00       1                   0
 123       2        03-01-2019   03:30:00       1                   0
 123       2        03-01-2019   04:00:00       1                   0
 123       1        03-01-2019   14:00:00       1                   0
 123       2        03-01-2019   15:00:00       1                   0
 123       2        03-01-2019   00:30:00       1                   0
 123       2        04-01-2019   11:00:00       1                   0
 123       2        04-01-2019   11:30:00       0                   0 
 123       2        04-01-2019   12:00:00       1                   0
 123       2        04-01-2019   13:30:00       1                   0
 123       2        05-01-2019   03:00:00       1                   0
 123       2        05-01-2019   03:30:00       1                   0 
 123       2        05-01-2019   04:00:00       1                   0

我怎样才能在熊猫身上做到这一点


Tags: 数据no标记dataframedatetime时间code
3条回答

可能有一种方法可以一次性完成,但两步法更简单, 首先你一个接一个地选择坦克,然后你寻找五个1的顺序

This other question已经解决了在列中搜索模式的问题

如果您想换一种方式看rolling,您可以对1求和或使用all values are True条件来查找n元素的序列

您也可以只mask屏蔽一列,但这将只提供掩码中的值。这解决了另一个问题,“在给定的时间内,哪些油箱不工作”

使用:

df['final_flag'] = ( df.groupby([df['no_operation_flag'].ne(1).cumsum(),
                                 'tank',
                                 'date',
                                 pd.to_datetime(df['time'].astype(str))
                                   .diff()
                                   .ne(pd.Timedelta(minutes = 30))
                                   .cumsum(),
                                'no_operation_flag'])['no_operation_flag']
                    .transform('size')
                    .gt(5)
                    .view('uint8') )
print(df)

输出

    code  tank        date      time  no_operation_flag  final_flag
0    123     1  01-01-2019  00:00:00                  1           0
1    123     1  01-01-2019  00:30:00                  1           0
2    123     1  01-01-2019  01:00:00                  0           0
3    123     1  01-01-2019  01:30:00                  1           1
4    123     1  01-01-2019  02:00:00                  1           1
5    123     1  01-01-2019  02:30:00                  1           1
6    123     1  01-01-2019  03:00:00                  1           1
7    123     1  01-01-2019  03:30:00                  1           1
8    123     1  01-01-2019  04:00:00                  1           1
9    123     1  01-01-2019  05:00:00                  1           0
10   123     1  01-01-2019  14:00:00                  1           1
11   123     1  01-01-2019  14:30:00                  1           1
12   123     1  01-01-2019  15:00:00                  1           1
13   123     1  01-01-2019  15:30:00                  1           1
14   123     1  01-01-2019  16:00:00                  1           1
15   123     1  01-01-2019  16:30:00                  1           1
16   123     2  02-01-2019  00:00:00                  1           0
17   123     2  02-01-2019  00:30:00                  0           0
18   123     2  02-01-2019  01:00:00                  0           0
19   123     2  02-01-2019  01:30:00                  0           0
20   123     2  02-01-2019  02:00:00                  1           0
21   123     2  02-01-2019  02:30:00                  1           0
22   123     2  02-01-2019  03:00:00                  1           0
23   123     2  03-01-2019  03:30:00                  1           0
24   123     2  03-01-2019  04:00:00                  1           0
25   123     1  03-01-2019  14:00:00                  1           0
26   123     2  03-01-2019  15:00:00                  1           0
27   123     2  03-01-2019  00:30:00                  1           0
28   123     2  04-01-2019  11:00:00                  1           0
29   123     2  04-01-2019  11:30:00                  0           0
30   123     2  04-01-2019  12:00:00                  1           0
31   123     2  04-01-2019  13:30:00                  1           0
32   123     2  05-01-2019  03:00:00                  1           0
33   123     2  05-01-2019  03:30:00                  1           0

您可以使用类似于this的解决方案,仅使用新助手DataFrame过滤每个组的连续日期时间,添加所有缺少的日期时间,最后merge用于添加新列:

df['datetimes'] = pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str))
df1 = (df.set_index('datetimes')
          .groupby(['code','tank', 'date'])['no_operation_flag']
          .resample('30T')
          .first()
          .reset_index())

shifted1 = df1.groupby(['code','tank', 'date'])['no_operation_flag'].shift()
g1 = df1['no_operation_flag'].ne(shifted1).cumsum()
mask1 = g1.map(g1.value_counts()).gt(5) & df1['no_operation_flag'].eq(1)

df1['final_flag'] = mask1.astype(int)
#print (df1.head(40))

df = df.merge(df1[['code','tank','datetimes','final_flag']]).drop('datetimes', axis=1)

print (df)
    code  tank        date      time  no_operation_flag  final_flag
0    123     1  01-01-2019  00:00:00                  1           0
1    123     1  01-01-2019  00:30:00                  1           0
2    123     1  01-01-2019  01:00:00                  0           0
3    123     1  01-01-2019  01:30:00                  1           1
4    123     1  01-01-2019  02:00:00                  1           1
5    123     1  01-01-2019  02:30:00                  1           1
6    123     1  01-01-2019  03:00:00                  1           1
7    123     1  01-01-2019  03:30:00                  1           1
8    123     1  01-01-2019  04:00:00                  1           1
9    123     1  01-01-2019  05:00:00                  1           0
10   123     1  01-01-2019  14:00:00                  1           1
11   123     1  01-01-2019  14:30:00                  1           1
12   123     1  01-01-2019  15:00:00                  1           1
13   123     1  01-01-2019  15:30:00                  1           1
14   123     1  01-01-2019  16:00:00                  1           1
15   123     1  01-01-2019  16:30:00                  1           1
16   123     2  02-01-2019  00:00:00                  1           0
17   123     2  02-01-2019  00:30:00                  0           0
18   123     2  02-01-2019  01:00:00                  0           0
19   123     2  02-01-2019  01:30:00                  0           0
20   123     2  02-01-2019  02:00:00                  1           0
21   123     2  02-01-2019  02:30:00                  1           0
22   123     2  02-01-2019  03:00:00                  1           0
23   123     2  03-01-2019  03:30:00                  1           0
24   123     2  03-01-2019  04:00:00                  1           0
25   123     1  03-01-2019  14:00:00                  1           0
26   123     2  03-01-2019  15:00:00                  1           0
27   123     2  03-01-2019  00:30:00                  1           0
28   123     2  04-01-2019  11:00:00                  1           0
29   123     2  04-01-2019  11:30:00                  0           0
30   123     2  04-01-2019  12:00:00                  1           0
31   123     2  04-01-2019  13:30:00                  1           0
32   123     2  05-01-2019  03:00:00                  1           0
33   123     2  05-01-2019  03:30:00                  1           0
34   123     2  05-01-2019  04:00:00                  1           0

相关问题 更多 >

    热门问题