Pandas:平均超过同一个侯

2024-06-03 15:30:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv,看起来像这样:

YYYY-MO-DD HH-MI-SS_SSS             ATMOSPHERIC PRESSURE (hPa) mean
2/24/2016 13:00                            1011.937618
2/24/2016 14:00                            1011.721583
2/24/2016 15:00                            1011.348064
2/24/2016 16:00                            1011.30785
2/24/2016 17:00                            1011.3198
2/24/2016 18:00                            1011.403372
2/24/2016 19:00                            1011.485108
2/24/2016 20:00                            1011.270083
2/24/2016 21:00                            1010.936331
2/24/2016 22:00                            1010.920958
2/24/2016 23:00                            1010.816478
2/25/2016 00:00                            1010.899142
2/25/2016 01:00                            1010.209392
2/25/2016 02:00                            1009.700625
2/25/2016 03:00                            1009.457683
2/25/2016 04:00                            1009.268081
2/25/2016 05:00                            1009.718639
2/25/2016 06:00                            1010.745444
2/25/2016 07:00                            1011.062028
2/25/2016 08:00                            1011.168117
2/25/2016 09:00                            1010.771281
2/25/2016 10:00                            1010.138053
2/25/2016 11:00                            1009.509119
2/25/2016 12:00                            1008.703811
2/25/2016 13:00                            1008.021547
2/25/2016 14:00                            1007.774825
   .....                                     .....

我想创建一个新的数据帧,其中包含每天同一小时的平均值:

^{pr2}$

有什么简单的方法吗?在

谢谢!在


Tags: csv数据hhmeanssdd平均值mo
1条回答
网友
1楼 · 发布于 2024-06-03 15:30:18

将日期解析为类似Pandas日期时间的序列后,就可以使用^{} accessor访问时间序列的小时:

df['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YYYY-MO-DD HH-MI-SS_SSS'])
hour = pd.to_timedelta(df['YYYY-MO-DD HH-MI-SS_SSS'].dt.hour, unit='H')

然后,您可以按hour分组并计算每组的平均值:

^{pr2}$
import pandas as pd
df = pd.DataFrame(
    {'ATMOSPHERIC PRESSURE (hPa) mean': 
     [1011.937618, 1011.721583, 1011.348064, 1011.30785, 1011.3198, 1011.403372, 
      1011.485108, 1011.270083, 1010.936331, 1010.920958, 1010.816478, 1010.899142, 
      1010.209392, 1009.700625, 1009.457683, 1009.268081, 1009.718639, 1010.745444, 
      1011.062028, 1011.168117, 1010.771281, 1010.138053, 1009.509119, 1008.703811, 
      1008.021547, 1007.774825],
     'YYYY-MO-DD HH-MI-SS_SSS': 
     ['2/24/2016 13:00', '2/24/2016 14:00', '2/24/2016 15:00', '2/24/2016 16:00', 
      '2/24/2016 17:00', '2/24/2016 18:00', '2/24/2016 19:00', '2/24/2016 20:00', 
      '2/24/2016 21:00', '2/24/2016 22:00', '2/24/2016 23:00', '2/25/2016 00:00', 
      '2/25/2016 01:00', '2/25/2016 02:00', '2/25/2016 03:00', '2/25/2016 04:00', 
      '2/25/2016 05:00', '2/25/2016 06:00', '2/25/2016 07:00', '2/25/2016 08:00', 
      '2/25/2016 09:00', '2/25/2016 10:00', '2/25/2016 11:00', '2/25/2016 12:00', 
      '2/25/2016 13:00', '2/25/2016 14:00']})
df['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YYYY-MO-DD HH-MI-SS_SSS'])

hour = pd.to_timedelta(df['YYYY-MO-DD HH-MI-SS_SSS'].dt.hour, unit='H')
hour.name = 'Hour'
result = df.groupby(hour).mean()

收益率

                         ATMOSPHERIC PRESSURE (hPa) mean
YYYY-MO-DD HH-MI-SS_SSS                                 
00:00:00                                     1010.899142
01:00:00                                     1010.209392
02:00:00                                     1009.700625
03:00:00                                     1009.457683
04:00:00                                     1009.268081
05:00:00                                     1009.718639
...

相关问题 更多 >