Python时间序列对齐和“迄今”函数

Basket Sale Date PrevSale SaleCount MeanToDate MaxToDate 88 $15 3/01/2012 1 88 $30 11/02/2012 $15 2 $23 $30 88 $16 16/08/2012 $30 3 $20 $30 123 $90 18/06/2012 1 477 $77 19/08/2012 1 477 $57 11/12/2012 $77 2 $67 $77 566 $90 6/07/2012 1

2条回答

网友
1楼 · 编辑于 2024-09-24 22:22:43

import pandas as pd pd.__version__ # u'0.24.2' from pandas import concat def handler(grouped): se = grouped.set_index('Date')['Sale'].sort_index() return concat( { 'MeanToDate': se.expanding().mean(), # cumulative mean 'MaxToDate': se.expanding().max(), # cumulative max 'SaleCount': se.expanding().count(), # cumulative count 'Sale': se, # simple copy 'PrevSale': se.shift(1) # previous sale }, axis=1 ) ########################### from datetime import datetime df = pd.DataFrame({'Basket':[88,88,88,123,477,477,566], 'Sale':[15,30,16,90,77,57,90], 'Date':[datetime.strptime(ds,'%d/%m/%Y') for ds in ['3/01/2012','11/02/2012','16/08/2012','18/06/2012', '19/08/2012','11/12/2012','6/07/2012']]}) ######### new_df = df.groupby('Basket').apply(handler).reset_index()

网友
2楼 · 编辑于 2024-09-24 22:22:43

这应该可以做到：
from pandas import concat from pandas.stats.moments import expanding_mean, expanding_count def handler(grouped): se = grouped.set_index('Date')['Sale'].sort_index() # se is the (ordered) time series of sales restricted to a single basket # we can now create a dataframe by combining different metrics # pandas has a function for each of the ones you are interested in! return concat( { 'MeanToDate': expanding_mean(se), # cumulative mean 'MaxToDate': se.cummax(), # cumulative max 'SaleCount': expanding_count(se), # cumulative count 'Sale': se, # simple copy 'PrevSale': se.shift(1) # previous sale }, axis=1 ) # we then apply this handler to all the groups and pandas combines them # back into a single dataframe indexed by (Basket, Date) # we simply need to reset the index to get the shape you mention in your question new_df = df.groupby('Basket').apply(handler).reset_index()
您可以阅读有关分组/聚合here的更多信息。在

相关问题更多 >

编程相关推荐

热门问题

热门文章