Python Pandas等价的SQL case语句,使用lead和lag窗口函数

2024-09-28 19:24:02 发布

您现在位置:Python中文网/ 问答频道 /正文

这里是Python的新手,正在尝试是否有更优雅的解决方案。在

我有一个时间序列数据的远程通信设备,有运动指示器。我需要将运动指示器扩展到实际运动开始和停止的+/-1行(由下面的motion2列表示)。我在SQL中使用case语句和lead和lag窗口函数。正在尝试将我的代码转换为python。。。在

这是数据。 将熊猫作为pd导入

data = {'device':[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2], 
    'time':[1,2,3,4,5,6,7,8,9,10,11,12,5,6,7,8,9,10,11,12,13,14],
    'motion':[0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0]}

df = pd.DataFrame.from_dict(data)
df = df[['device','time','motion']]

##sort data chronologically for each device
df.sort_values(['device','time'], ascending = True, inplace = True)

这就是df的样子

^{pr2}$

我需要的是下面的motion2列添加到数据帧中。在

device, time, motion, motion2
1,1,0,0
1,2,0,1
1,3,1,1
1,4,1,1
1,5,1,1
1,6,0,1
1,7,0,0
1,8,0,1
1,9,1,1
1,10,1,1
1,11,0,1
1,12,0,0
2,5,0,0
2,6,0,0
2,7,0,1
2,8,1,1
2,9,1,1
2,10,1,1
2,11,0,1
2,12,1,1
2,13,0,1
2,14,0,0

下面是可以工作的python代码。不过,想知道有没有更优雅的方式。在

##create new columns for prior and next motion indicator
df['prev_motion'] = df.groupby(['device'])['motion'].shift(1)
df['next_motion'] = df.groupby(['device'])['motion'].shift(-1)

##create the desired motion2 indicator to expand +/- 1 record of the motion 
start and stop

df['motion2'] = df[['prev_motion', 'motion', 'next_motion']].apply(lambda 
row: 1 if row['motion']==1 else (1 if row['prev_motion']==1 or 
row['next_motion']==1 else 0), axis=1)

##drop unwanted columns        
df.drop(columns=['prev_motion', 'next_motion'], inplace = True)

这在使用case语句和windows函数(lead和lag)的SQL中要容易得多。在

case 
when motion = 1 then 1
when motion = 0 and (lead(motion) over (partition by device order by time) = 1) then 1
when motion = 0 and (lag(motion) over (partition by device order by time) = 1) then 1
else 0
end as motion2

Tags: and数据dfdatabytimedevicelag
1条回答
网友
1楼 · 发布于 2024-09-28 19:24:02

这不一定是最优雅的,但它是有效的:找到motion1,或者{}在任何一个方向上移动1的1。这里有两种使用numpy函数的方法(请注意,numpy函数不需要显式地导入numpy,因为它们也内置在pandas中,可以通过pd.np进行访问,但请参见@Abhi的评论,以获得一个纯粹的pandas等效函数:

df['motion2'] = pd.np.where(df.motion.values|pd.np.roll(df.motion.values,1)|pd.np.roll(df.motion.values,-1),1,0)

# The following is Essentially the equivalent, but maybe a bit clearer / more efficient
df['motion2'] = pd.np.stack((df.motion.values,pd.np.roll(df.motion.values,1),pd.np.roll(df.motion.values,-1))).any(0).astype(int)

>>> df
    device  time  motion  motion2
0        1     1       0        0
1        1     2       0        1
2        1     3       1        1
3        1     4       1        1
4        1     5       1        1
5        1     6       0        1
6        1     7       0        0
7        1     8       0        1
8        1     9       1        1
9        1    10       1        1
10       1    11       0        1
11       1    12       0        0
12       2     5       0        0
13       2     6       0        0
14       2     7       0        1
15       2     8       1        1
16       2     9       1        1
17       2    10       1        1
18       2    11       0        1
19       2    12       1        1
20       2    13       0        1
21       2    14       0        0

相关问题 更多 >