如何适当地迭代时间序列的数据帧

import pandas as pd def make_predictions(df): res = pd.DataFrame() for ticker in df.ticker.unique(): df_ticker = df[df['ticker'] == ticker] for i,_ in df_ticker.iterrows(): X = df_ticker[0:i] X = do_preparations(X) # do some processing to prepare the data m = train_model(X) # train the model forecast = make_predictions(m) # predict one week df_ticker.loc[i,'preds'] = forecast['y'][0] res = pd.concat([res,df_ticker]) return res

1条回答

网友

1楼 · 发布于 2024-05-26 00:33:32

考虑几个项目：

首先，通过在循环内调用pd.concat来避免quadratic copying。相反，构建一个数据帧列表/目录，在循环外部连接一次
其次，避免使用DataFrame.iterrows，因为您只使用i。相反，遍历index
第三，对于紧凑性，避免unique()和后续子集[...]。相反，在字典或列表理解中使用groupby()，这可能比list.append方法稍微快一点，并且由于需要多个步骤，因此需要一个内部定义的函数

内部循环可能是不可避免的，因为您实际上运行的是不同的模型

def make_predictions(df):

   def proc_model(sub_df):

      for i in sub_df.index:
         X = sub_df.loc[0:i]
         X = do_preparations(X)           # do some processing to prepare the data
         m = train_model(X)               # train the model
         forecast = make_predictions(m)   # predict one week

         sub_df.loc[i,'preds'] = forecast['y'][0]

      return sub_df   

   # BUILD DICTIONARY OF DATA FRAMES
   df_dict = {i:proc_model(g) for i, g in df.groupby('ticker')}

   # CONCATENATE DATA FRAMES
   res = pd.concat(df_dict, ignore_index=True)

   return res

相关问题更多 >

编程相关推荐

热门问题

热门文章