我想对一个数据帧应用滚动函数,这个数据帧不是由日期戳索引的,而是由实验开始后的相对星期索引的。 例如,这是我的起始数据帧:
df = pd.DataFrame([{'RelWeek': 3, 'id': 33, 'Val_1': 20,'Val_2':0.6,'Val_3':40,'TimeofDay':'MORN'}
,{'RelWeek': 3, 'id': 33, 'Val_1': 40,'Val_2':0.9,'Val_3':27,'TimeofDay':'NOON'}
,{'RelWeek': 3, 'id': 33, 'Val_1': 24,'Val_2':0.65,'Val_3':44,'TimeofDay':'EVE'}
,{'RelWeek': 4, 'id': 33, 'Val_1': 25,'Val_2':0.34,'Val_3':55,'TimeofDay':'MORN'}
,{'RelWeek': 4, 'id': 33, 'Val_1': 21,'Val_2':0.61,'Val_3':34,'TimeofDay':'NOON'}
,{'RelWeek': 4, 'id': 33, 'Val_1': 25,'Val_2':0.67,'Val_3':21,'TimeofDay':'EVE'}
,{'RelWeek': 5, 'id': 33, 'Val_1': 20,'Val_2':0.6,'Val_3':40,'TimeofDay':'MORN'}
,{'RelWeek': 5, 'id': 33, 'Val_1': np.nan,'Val_2':np.nan,'Val_3': np.nan,'TimeofDay':'NOON'}
,{'RelWeek': 5, 'id': 33, 'Val_1': 25,'Val_2':0.68,'Val_3':48,'TimeofDay':'EVE'}
,{'RelWeek': 3, 'id': 33, 'Val_1': 20,'Val_2':0.6,'Val_3':40,'TimeofDay':'MORN'}
,{'RelWeek': 3, 'id': 33, 'Val_1': 40,'Val_2':0.9,'Val_3':27,'TimeofDay':'NOON'}
,{'RelWeek': 2, 'id': 34, 'Val_1': 24,'Val_2':0.65,'Val_3':44,'TimeofDay':'EVE'}
,{'RelWeek': 2, 'id': 34, 'Val_1': 25,'Val_2':0.34,'Val_3':55,'TimeofDay':'MORN'}
,{'RelWeek': 2, 'id': 34, 'Val_1': 21,'Val_2':0.61,'Val_3':34,'TimeofDay':'NOON'}
,{'RelWeek': 3, 'id': 34, 'Val_1': 25,'Val_2':0.67,'Val_3':21,'TimeofDay':'EVE'}
,{'RelWeek': 3, 'id': 34, 'Val_1': 20,'Val_2':0.6,'Val_3':40,'TimeofDay':'MORN'}
,{'RelWeek': 3, 'id': 34, 'Val_1': 25,'Val_2':0.68,'Val_3':48,'TimeofDay':'EVE'}])
df
RelWeek TimeofDay Val_1 Val_2 Val_3 id
0 3 MORN 20.0 0.60 40.0 33
1 3 NOON 40.0 0.90 27.0 33
2 3 EVE 24.0 0.65 44.0 33
3 4 MORN 25.0 0.34 55.0 33
4 4 NOON 21.0 0.61 34.0 33
5 4 EVE 25.0 0.67 21.0 33
6 5 MORN 20.0 0.60 40.0 33
7 5 NOON NaN NaN NaN 33
8 5 EVE 25.0 0.68 48.0 33
9 3 MORN 20.0 0.60 40.0 33
10 3 NOON 40.0 0.90 27.0 33
11 2 EVE 24.0 0.65 44.0 34
12 2 MORN 25.0 0.34 55.0 34
13 2 NOON 21.0 0.61 34.0 34
14 3 EVE 25.0 0.67 21.0 34
15 3 MORN 20.0 0.60 40.0 34
16 3 EVE 25.0 0.68 48.0 34
I want to do 3 things now
- groupby('id' ,Timeof Day')
- Take the max and min 'RelWeek for each id and iterate over RelWeek. It is important to note that relative week is different for each id . Implement a rolling function for Window= 3 or ( a flexible number) over Range of min to max Iter value
- the output of the rolling function should appear as - "Original column name_TimeofDay_function'(By function I mean the aggregate function applied to rolling window, for e.g Min, Max, Stdev, Mean etc )
输出:(下面是理想输出的外观就像。那个数字现在只是虚数,并不反映实际的平均值或标准偏差等。列的名称真正反映了列的内容应该是什么。任何替代的输出方式,只要它给我的信息,上述要求也工作
Output_df
Val_1_EVE_Mean Val_1_EVE_Stdev Val_1_MORN_Mean Val_1_MORN_Stdev \
0 24.6 24.6 21.6 21.6
1 24.6 24.6 21.6 21.6
2 24.6 24.6 21.6 21.6
3 24.6 24.6 21.6 21.6
4 24.6 24.6 21.6 21.6
5 24.6 24.6 21.6 21.6
Val_1_NOON_Mean Val_1_NOON_Stdev Val_2_EVE_Mean Val_2_EVE_Stdev \
0 33.6 33.6 24.6 24.6
1 33.6 33.6 24.6 24.6
2 33.6 33.6 24.6 24.6
3 33.6 33.6 24.6 24.6
4 33.6 33.6 24.6 24.6
5 33.6 33.6 24.6 24.6
Val_2_MORN_Mean Val_2_MORN_Stdev Val_2_NOON_Mean Val_2_NOON_Stdev \
0 0.51 0.51 33.6 33.6
1 0.51 0.51 33.6 33.6
2 0.51 0.51 33.6 33.6
3 0.51 0.51 33.6 33.6
4 0.51 0.51 33.6 33.6
5 0.51 0.51 33.6 33.6
Val_3_EVE_Mean Val_3_EVE_Stdev Val_3_MORN_Mean Val_3_MORN_Stdev \
0 24.6 24.6 21.6 21.6
1 24.6 24.6 21.6 21.6
2 24.6 24.6 21.6 21.6
3 24.6 24.6 21.6 21.6
4 24.6 24.6 21.6 21.6
5 24.6 24.6 21.6 21.6
Val_3_NOON_Mean Val_3_NOON_Stdev Window id
0 33.6 33.6 [3, 4, 5] 33
1 33.6 33.6 [4, 5, 6] 33
2 33.6 33.6 [5, 6, 7] 33
3 33.6 33.6 [2, 3, 4] 34
4 33.6 33.6 [3, 4, 5] 34
5 33.6 33.6 [4, 5, 6] 34
提前感谢专家们能给我一些方向
目前没有回答
相关问题 更多 >
编程相关推荐