基于stats模型的时间序列分析问题的回答

基于stats模型的时间序列分析

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我真的希望看到一个数据示例以及一个代码片段来重现您的错误。如果没有这些，我的建议将不会解决您的特定错误消息。但是，它允许您对pandas数据帧中存储的一组时间序列运行多元回归分析。假设您在时间序列中使用连续值而不是类别值，下面是我如何使用pandas和statsmodels来实现： 具有随机值的数据帧： <pre><code># Imports import pandas as pd import numpy as np import itertools np.random.seed(1) rows = 12 listVars= ['y','x1', 'x2', 'x3'] rng = pd.date_range('1/1/2017', periods=rows, freq='D') df_1 = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) df_1 = df_1.set_index(rng) print(df_1) </code></pre> 输出-要处理的某些数据： ^{2}$ 下面的函数将允许您指定一个源数据帧、一个因变量y和一组自变量x1、x2。使用statsmodels，一些期望的结果将存储在一个数据帧中。在这里，R2将是数值类型，而回归系数和p值将是列表，因为这些估计值的数量将随您希望包含在分析中的自变量的数量而变化。在 <pre><code>def LinReg(df, y, x, const): betas = x.copy() # Model with out without a constant if const == True: x = sm.add_constant(df[x]) model = sm.OLS(df[y], x).fit() else: model = sm.OLS(df[y], df[x]).fit() # Estimates of R2 and p res1 = {'Y': [y], 'R2': [format(model.rsquared, '.4f')], 'p': [model.pvalues.tolist()], 'start': [df.index[0]], 'stop': [df.index[-1]], 'obs' : [df.shape[0]], 'X': [betas]} df_res1 = pd.DataFrame(data = res1) # Regression Coefficients theParams = model.params[0:] coefs = theParams.to_frame() df_coefs = pd.DataFrame(coefs.T) xNames = list(df_coefs) xValues = list(df_coefs.loc[0].values) xValues2 = [ '%.2f' % elem for elem in xValues ] res2 = {'Independent': [xNames], 'beta': [xValues2]} df_res2 = pd.DataFrame(data = res2) # All results df_res = pd.concat([df_res1, df_res2], axis = 1) df_res = df_res.T df_res.columns = ['results'] return(df_res) </code></pre> 下面是一个测试运行： <pre><code>df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True) print(df_regression) </code></pre> 输出： <pre><code> results R2 0.3650 X [x1, x2] Y y obs 12 p [0.7417691742514285, 0.07989515781898897, 0.25... start 2017-01-01 00:00:00 stop 2017-01-12 00:00:00 Independent [const, x1, x2] coefficients [16.29, 0.47, 0.37] </code></pre> 以下是简单复制粘贴的全部内容： <pre><code># Imports import pandas as pd import numpy as np import statsmodels.api as sm np.random.seed(1) rows = 12 listVars= ['y','x1', 'x2', 'x3'] rng = pd.date_range('1/1/2017', periods=rows, freq='D') df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) df = df.set_index(rng) def LinReg(df, y, x, const): betas = x.copy() # Model with out without a constant if const == True: x = sm.add_constant(df[x]) model = sm.OLS(df[y], x).fit() else: model = sm.OLS(df[y], df[x]).fit() # Estimates of R2 and p res1 = {'Y': [y], 'R2': [format(model.rsquared, '.4f')], 'p': [model.pvalues.tolist()], 'start': [df.index[0]], 'stop': [df.index[-1]], 'obs' : [df.shape[0]], 'X': [betas]} df_res1 = pd.DataFrame(data = res1) # Regression Coefficients theParams = model.params[0:] coefs = theParams.to_frame() df_coefs = pd.DataFrame(coefs.T) xNames = list(df_coefs) xValues = list(df_coefs.loc[0].values) xValues2 = [ '%.2f' % elem for elem in xValues ] res2 = {'Independent': [xNames], 'beta': [xValues2]} df_res2 = pd.DataFrame(data = res2) # All results df_res = pd.concat([df_res1, df_res2], axis = 1) df_res = df_res.T df_res.columns = ['results'] return(df_res) df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True) print(df_regression) </code></pre>

基于stats模型的时间序列分析

1 个回答

相关Python问题