利用时间风中的两个索引对大Pandas数据帧进行去栈

2024-06-23 18:30:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧:

| car_id | timestamp           | gas | odometer | temperature |
|--------|---------------------|-----|----------|-------------|
| aac43f | 2019-10-05 14:00:00 | 70  | 152042   | 87          |
| aac43f | 2019-10-05 15:00:00 | 63  | 152112   | 88          |
| aac43f | 2019-10-05 18:00:00 | 44  | 152544   | 93          |
| bg112  | 2019-08-22 09:00:00 | 90  | 1242     | 85          |
| bg112  | 2019-08-22 10:00:00 | 89  | 1270     | 85          |
| 32rre  | 2019-01-01 12:00:00 | 20  | 84752    | 74          |

我想通过car_idtimestamp对其进行转换,其中新特性-1小时和2小时前的传感器读数如下:

| car_id | timestamp           | gas | gas_1h_ago | gas_2h_ago | odometer | o_1h   | o_2h | temperature | t_1h_ago | t_2h_ago |
|--------|---------------------|-----|------------|------------|----------|--------|------|-------------|----------|----------|
| aac43f | 2019-10-05 14:00:00 | 70  | NaN        | NaN        | 152042   | NaN    | NaN  | 87          | NaN      | NaN      |
| aac43f | 2019-10-05 15:00:00 | 63  | 70         | NaN        | 152112   | 152042 | NaN  | 88          | 87       | NaN      |
| aac43f | 2019-10-05 18:00:00 | 44  | NaN        | NaN        | 152544   | NaN    | NaN  | 93          | NaN      | NaN      |
| bg112  | 2019-08-22 09:00:00 | 90  | NaN        | NaN        | 1242     | NaN    | NaN  | 85          | NaN      | NaN      |
| bg112  | 2019-08-22 10:00:00 | 89  | 90         | NaN        | 1270     | 1242   | NaN  | 85          | 85       | NaN      |
| 32rre  | 2019-01-01 12:00:00 | 20  | NaN        | NaN        | 84752    | NaN    | NaN  | 74          | NaN      | NaN      |

我想我可以使用unstack函数,但是,我想不出解决办法。你知道吗


Tags: 数据id传感器特性nanagocartimestamp
1条回答
网友
1楼 · 发布于 2024-06-23 18:30:18

你可以使用^{}

^{}做几个小时的样本。 使用^{}+^{}传输每个小时的当前时间值。 使用^{}将数据帧返回到其原始行。 在使用^{}之前,根据x小时向列添加后缀 执行此操作x小时,然后使用^{}加入它。你知道吗

最后,用^{}再次将得到的eldataframe与原eldataframe合并。用^{}+^{}+^{}重新排列列

hours_ago = [1,2]

#Creating a DataFrame by hour ago and concat

df_x_hours_ago= (

pd.concat(

[( df.groupby('car_id')
     .apply(lambda x: x.resample('H',on='timestamp')
                       .sum(min_count=1)
                       .shift(hour))
     .reset_index(level='car_id',drop='car_id')                 
     .reindex(index=df['timestamp'])
     .add_suffix(f'_{hour}h_ago')
     .reset_index(drop=True))

   for hour in hours_ago],
axis=1)

)
#Concat and ordering columns:

new_df=( pd.concat([df,df_x_hours_ago],axis=1)
           .set_index(['car_id','timestamp'])
           .sort_index(axis=1)
           .reset_index() )
print(new_df)

输出

   car_id           timestamp  gas  gas_1h_ago  gas_2h_ago  odometer  \
0  aac43f 2019-10-05 14:00:00   70         NaN         NaN    152042   
1  aac43f 2019-10-05 15:00:00   63        70.0         NaN    152112   
2  aac43f 2019-10-05 18:00:00   44         NaN         NaN    152544   
3   bg112 2019-08-22 09:00:00   90         NaN         NaN      1242   
4   bg112 2019-08-22 10:00:00   89        90.0         NaN      1270   
5   32rre 2019-01-01 12:00:00   20         NaN         NaN     84752   

   odometer_1h_ago  odometer_2h_ago  temperature  temperature_1h_ago  \
0              NaN              NaN           87                 NaN   
1         152042.0              NaN           88                87.0   
2              NaN              NaN           93                 NaN   
3              NaN              NaN           85                 NaN   
4           1242.0              NaN           85                85.0   
5              NaN              NaN           74                 NaN   

   temperature_2h_ago  
0                 NaN  
1                 NaN  
2                 NaN  
3                 NaN  
4                 NaN  
5                 NaN  

用0填充删除min_count=1

相关问题 更多 >

    热门问题