我有一个pandas数据帧,我希望每个id重新采样10秒。但是,我还希望扩展输出以返回每个id的采样开始时间和结束时间。数据帧、预期输出和我尝试的内容如下
数据帧:
id,date,value
1,2012-01-01 00:09:45,1
1,2012-01-01 00:09:46,1
2,2012-01-01 00:09:47,1
1,2012-01-01 00:09:47,1
2,2012-01-01 00:09:48,1
1,2012-01-01 00:09:51,1
1,2012-01-01 00:09:52,1
1,2012-01-01 00:09:53,1
2,2012-01-01 00:10:00,1
2,2012-01-01 00:10:01,1
2,2012-01-01 00:10:04,1
2,2012-01-01 00:10:05,1
2,2012-01-01 00:10:06,1
3,2012-01-01 00:30:04,1
3,2012-01-01 00:30:05,1
3,2012-01-01 00:30:06,1
3,2012-01-01 00:30:08,1
3,2012-01-01 00:30:09,1
2,2012-01-01 00:30:18,1
2,2012-01-01 00:30:19,1
2,2012-01-01 00:30:23,1
2,2012-01-01 00:30:24,1
3,2012-01-01 00:30:25,1
3,2012-01-01 00:30:26,1
3,2012-01-01 00:30:29,1
3,2012-01-01 00:30:30,1
3,2012-01-01 00:30:32,1
3,2012-01-01 00:30:33,1
预期产出:
id,date,value,start-time,end-time
1,2012-01-01 00:09:40,3,2012-01-01 00:09:45,2012-01-01 00:09:47
2,2012-01-01 00:09:40,2,2012-01-01 00:09:47,2012-01-01 00:09:48
1,2012-01-01 00:09:50,3,2012-01-01 00:09:51,2012-01-01 00:09:53
2,2012-01-01 00:10:00,5,2012-01-01 00:10:00,2012-01-01 00:10:06
3,2012-01-01 00:30:00,5,2012-01-01 00:30:04,2012-01-01 00:30:09
2,2012-01-01 00:30:10,2,2012-01-01 00:30:18,2012-01-01 00:30:19
2,2012-01-01 00:30:20,2,2012-01-01 00:30:23,2012-01-01 00:30:24
3,2012-01-01 00:30:20,3,2012-01-01 00:30:25,2012-01-01 00:30:29
3,2012-01-01 00:30:30,3,2012-01-01 00:30:30,2012-01-01 00:30:33
以下是我与输出一起完成的工作:
import pandas as pd
df = pd.read_csv('df.csv')
df['date'] = pd.to_datetime(df['date'])
df_resampled = df.set_index('date').groupby('id').resample('10s')['value'].sum().reset_index()
df = df_resampled[df_resampled['value']!=0]
print(df.sort_values(['date']))
到目前为止的输出:
id,date,value
1,2012-01-01 00:09:40,3
2,2012-01-01 00:09:40,2
1,2012-01-01 00:09:50,3
2,2012-01-01 00:10:00,5
3,2012-01-01 00:30:00,5
2,2012-01-01 00:30:10,2
2,2012-01-01 00:30:20,2
3,2012-01-01 00:30:20,3
3,2012-01-01 00:30:30,3
如何扩展当前的简单代码以包括每个id的10秒采样的开始和结束时间
尝试:
印刷品:
相关问题 更多 >
编程相关推荐