基于stats模型的时间序列分析

est = smf.ols(formula='r ~ spend + date', data=df).fit() print est.summary() coef std err t P>|t| [95.0% Conf. Int.] Intercept -6.249e-10 inf -0 nan nan nan date[T.Timestamp('2014-10-08 00:00:00')] -2.571e-10 inf -0 nan nan nan date[T.Timestamp('2014-10-15 00:00:00')] 9.441e-11 inf 0 nan nan nan date[T.Timestamp('2014-10-22 00:00:00')] 5.619e-11 inf 0 nan nan nan date[T.Timestamp('2014-10-29 00:00:00')] -8.035e-12 inf -0 nan nan nan date[T.Timestamp('2014-11-05 00:00:00')] 6.334e-11 inf 0 nan nan nan date[T.Timestamp('2014-11-12 00:00:00')] 7.9e+04 inf 0 nan nan nan date[T.Timestamp('2014-11-19 00:00:00')] 1.58e+05 inf 0 nan nan nan date[T.Timestamp('2014-11-26 00:00:00')] 1.58e+05 inf 0 nan nan nan date[T.Timestamp('2014-12-03 00:00:00')] 1.58e+05 inf 0 nan nan nan date[T.Timestamp('2014-12-10 00:00:00')] 2.28e+05 inf 0 nan nan nan date[T.Timestamp('2014-12-17 00:00:00')] 3.28e+05 inf 0 nan nan nan date[T.Timestamp('2014-12-24 00:00:00')] 3.705e+05 inf 0 nan nan nan spend 2.105e-10 inf 0 nan nan nan

2条回答

网友

1楼 · 编辑于 2024-09-28 03:24:53

我真的希望看到一个数据示例以及一个代码片段来重现您的错误。如果没有这些，我的建议将不会解决您的特定错误消息。但是，它允许您对pandas数据帧中存储的一组时间序列运行多元回归分析。假设您在时间序列中使用连续值而不是类别值，下面是我如何使用pandas和statsmodels来实现：

具有随机值的数据帧：

# Imports
import pandas as pd
import numpy as np
import itertools


np.random.seed(1)
rows = 12
listVars= ['y','x1', 'x2', 'x3']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df_1 = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df_1 = df_1.set_index(rng)

print(df_1)

输出-要处理的某些数据：

^{2}$

下面的函数将允许您指定一个源数据帧、一个因变量y和一组自变量x1、x2。使用statsmodels，一些期望的结果将存储在一个数据帧中。在这里，R2将是数值类型，而回归系数和p值将是列表，因为这些估计值的数量将随您希望包含在分析中的自变量的数量而变化。在

def LinReg(df, y, x, const):

    betas = x.copy()

    # Model with out without a constant
    if const == True:
        x = sm.add_constant(df[x])
        model = sm.OLS(df[y], x).fit()
    else:
        model = sm.OLS(df[y], df[x]).fit()

    # Estimates of R2 and p
    res1 = {'Y': [y],
            'R2': [format(model.rsquared, '.4f')],
            'p': [model.pvalues.tolist()],
            'start': [df.index[0]], 
            'stop': [df.index[-1]],
            'obs' : [df.shape[0]],
            'X': [betas]}
    df_res1 = pd.DataFrame(data = res1)

    # Regression Coefficients
    theParams = model.params[0:]
    coefs = theParams.to_frame()
    df_coefs = pd.DataFrame(coefs.T)
    xNames = list(df_coefs)
    xValues = list(df_coefs.loc[0].values)
    xValues2 = [ '%.2f' % elem for elem in xValues ]
    res2 = {'Independent': [xNames],
            'beta': [xValues2]}
    df_res2 = pd.DataFrame(data = res2)

    # All results
    df_res = pd.concat([df_res1, df_res2], axis = 1)
    df_res = df_res.T
    df_res.columns = ['results']
    return(df_res)

下面是一个测试运行：

df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True)
print(df_regression)

输出：

                                                            results
R2                                                       0.3650
X                                                      [x1, x2]
Y                                                             y
obs                                                          12
p             [0.7417691742514285, 0.07989515781898897, 0.25...
start                                       2017-01-01 00:00:00
stop                                        2017-01-12 00:00:00
Independent                                     [const, x1, x2]
coefficients                                [16.29, 0.47, 0.37]

以下是简单复制粘贴的全部内容：

# Imports
import pandas as pd
import numpy as np
import statsmodels.api as sm

np.random.seed(1)
rows = 12
listVars= ['y','x1', 'x2', 'x3']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df = df.set_index(rng)

def LinReg(df, y, x, const):

    betas = x.copy()

    # Model with out without a constant
    if const == True:
        x = sm.add_constant(df[x])
        model = sm.OLS(df[y], x).fit()
    else:
        model = sm.OLS(df[y], df[x]).fit()

    # Estimates of R2 and p
    res1 = {'Y': [y],
            'R2': [format(model.rsquared, '.4f')],
            'p': [model.pvalues.tolist()],
            'start': [df.index[0]], 
            'stop': [df.index[-1]],
            'obs' : [df.shape[0]],
            'X': [betas]}
    df_res1 = pd.DataFrame(data = res1)

    # Regression Coefficients
    theParams = model.params[0:]
    coefs = theParams.to_frame()
    df_coefs = pd.DataFrame(coefs.T)
    xNames = list(df_coefs)
    xValues = list(df_coefs.loc[0].values)
    xValues2 = [ '%.2f' % elem for elem in xValues ]
    res2 = {'Independent': [xNames],
            'beta': [xValues2]}
    df_res2 = pd.DataFrame(data = res2)

    # All results
    df_res = pd.concat([df_res1, df_res2], axis = 1)
    df_res = df_res.T
    df_res.columns = ['results']
    return(df_res)

df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True)

print(df_regression)

网友

2楼 · 编辑于 2024-09-28 03:24:53

你为每一个日期拟合一个线性模型，因为ols将日期视为一个分类变量。我建议你试试：

est = smf.ols(formula='r ~ spend', data=df).fit()
print est.summary()

对于statsmodel，请尝试：

^{2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章