我有一个非常大的分钟时间序列数据集(3个月),格式如下
datetime,val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12
1/06/2017 0:00,0,0,0,0,0,0,0,0,0,0.011,0,0.036
1/06/2017 0:01,0,0,0,0,0,0,0,0,0,0.011,0,0.036
...
1/06/2017 23:59,0,0,0,0,0,0,0,0,0,0.011,0,0.035
2/06/2017 0:00,0,0,0,0,0,0,0,0,0,0.014,0,0.036
2/06/2017 0:01,0,0,0,0,0,0,0,0,0,0.011,0,0.036
...
2/06/2017 23:59,0,0,0,0,0,0,0,0,0,0.011,0,0.035
....
31/08/2017 0:00,0,0.2,0,0,0,0.56,0,0,0,0.014,0,0.036
31/08/2017 0:01,0,0.23,0,0,0,0,0,0,0,0.011,0,0.032
...
31/08/2017 23:59,0,0,0,0,0,0,.55,0,0,0.011,0,0.034
使用panda获得每个栏目每月平均值的最有效方法是什么? 预期产出为
month,val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12
06/2017,0,0,0,0,0,0,0,0,0,0.011,0,0.036
07/2017,0,0,0,0,0,0,0,0,0,0.014,0,0.036
08/2017,0,0,0.21,0,0,0,0,0.52,0,0.011,0,0.036
目前,我所做的是逐日读取数据集,然后得到一个累计天数的数据集,然后除以每月的天数。但这是非常低效的,需要很多时间。你知道吗
首先按^{} 转换列,然后按} 对于月初,最后将DatetimeIndex的格式更改为} :
MS
转换^{MM/YYY
按^{或者通过^{} 将转换后的datetimes列传递给
groupby
并聚合mean
:熊猫
read_csv
和to_csv
是您需要的:使用您的输入数据(从…中过滤),它提供:
相关问题 更多 >
编程相关推荐