使用pandas GroupBy和时间序列重采样的平均聚合

#CSV Import import pandas as pd path = r'Z:\Python\30_Min_Data.txt' from datetime import datetime customdateparse = lambda x: datetime.strptime(x, '%Y/%m/%d %H:%M:%S.%f') df = pd.read_csv( path, parse_dates={'DateTime': [0, 1]}, date_parser=customdateparse) # Set the Date as the Index --> needed for Resampling df.set_index('DateTime', inplace=True) df.sort_index()

df Out[3]: Volume Session DateTime 2020-12-16 08:00:00 1000 PRTH 2020-12-16 08:30:00 5000 PRTH 2020-12-16 09:00:00 1000 RTH 2020-12-16 09:30:00 3000 RTH 2020-12-17 08:00:00 2000 PRTH 2020-12-17 08:30:00 2000 PRTH 2020-12-17 09:00:00 2000 RTH 2020-12-17 09:30:00 2000 RTH 2020-12-18 08:00:00 1000 PRTH 2020-12-18 08:30:00 1000 PRTH 2020-12-18 09:00:00 1000 RTH 2020-12-18 09:30:00 1000 RTH 2019-11-18 08:00:00 1000 PRTH 2019-11-18 08:30:00 1000 PRTH 2019-11-18 09:00:00 1000 RTH 2019-11-18 09:30:00 1000 RTH

#2.Volume: Average per Year & Session & Day funcs_year = lambda idx: idx.year (df .groupby([funcs_year,'Session', pd.Grouper(freq='D')]) ['Volume'] .mean() ) Out[6]: Session DateTime 2019 PRTH 2019-11-18 1000 RTH 2019-11-18 1000 2020 PRTH 2020-12-16 3000 2020-12-17 2000 2020-12-18 1000 RTH 2020-12-16 2000 2020-12-17 2000 2020-12-18 1000 Name: Volume, dtype: int64

2条回答

网友

1楼 · 编辑于 2024-09-29 23:21:25

根据您的问题，“总和”显示基于“年”的“总量总和”，而“平均数”显示基于“日平均数”的“总量平均数”，两者均按“会话”和“日期时间”分组。（刚刚使用了一些带有连接的groupy链接）

import pandas as pd

data = { 
'DateTime':['2020-12-16 08:00:00','2020-12-16 08:30:00','2020-12-16 09:00:00','2020-12-16 09:30:00','2020-12-17 08:00:00','2020-12-17 08:30:00','2020-12-17 09:00:00','2020-12-17 09:30:00','2020-12-18 08:00:00','2020-12-18 08:30:00','2020-12-18 09:00:00','2020-12-18 09:30:00','2019-11-18 08:00:00','2019-11-18 08:30:00','2019-11-18 09:00:00','2019-11-18 09:30:00'],
'Volume':[1000,500,1000,3000,2000,2000,2000,2000,1000,1000,1000,1000,1000,1000,1000,1000],
'Session':['PRTH','PRTH','RTH','RTH','PRTH','PRTH','RTH','RTH','PRTH','PRTH','RTH','RTH','PRTH','PRTH','RTH','RTH']
}

df = pd.DataFrame(data)
df['DateTime'] = pd.to_datetime(df['DateTime'])
df.index = pd.to_datetime(df['DateTime'])


#See below code 
x = df.groupby([df.index.strftime('%Y'),'Session',df.index.strftime('%Y-%m-%d')]).agg({'Volume':['sum','mean']}).groupby(['DateTime','Session'],level=2).agg(['sum','mean'])
x['Volume'].drop('mean',axis=1,level=0)

网友

2楼 · 编辑于 2024-09-29 23:21:25

这对你有用吗：

df['Year']=df['DateTime'].dt.year
(df
   .groupby(['Year','Session'])
   .apply(lambda x: x['Volume'].sum()/len(x['DateTime'].dt.date.unique()))
)

请注意，“DateTime”现在应该是一列

我认为这计算出了每年和每节课每天的平均音量。你能试一试吗

相关问题更多 >

编程相关推荐

热门问题

热门文章