<p>我真的希望看到一个数据示例以及一个代码片段来重现您的错误。
如果没有这些,我的建议将不会解决您的特定错误消息。但是,它允许您对pandas数据帧中存储的一组时间序列运行多元回归分析。假设您在时间序列中使用连续值而不是类别值,下面是我如何使用pandas和statsmodels来实现:</p>
<p>具有随机值的数据帧:</p>
<pre><code># Imports
import pandas as pd
import numpy as np
import itertools
np.random.seed(1)
rows = 12
listVars= ['y','x1', 'x2', 'x3']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df_1 = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars)
df_1 = df_1.set_index(rng)
print(df_1)
</code></pre>
<p>输出-要处理的某些数据:</p>
^{2}$
<p>下面的函数将允许您指定一个源数据帧、一个因变量<strong>y</strong>和一组自变量x1、x2</strong>。使用statsmodels,一些期望的结果将存储在一个数据帧中。在这里,R2将是数值类型,而回归系数和p值将是列表,因为这些估计值的数量将随您希望包含在分析中的自变量的数量而变化。在</p>
<pre><code>def LinReg(df, y, x, const):
betas = x.copy()
# Model with out without a constant
if const == True:
x = sm.add_constant(df[x])
model = sm.OLS(df[y], x).fit()
else:
model = sm.OLS(df[y], df[x]).fit()
# Estimates of R2 and p
res1 = {'Y': [y],
'R2': [format(model.rsquared, '.4f')],
'p': [model.pvalues.tolist()],
'start': [df.index[0]],
'stop': [df.index[-1]],
'obs' : [df.shape[0]],
'X': [betas]}
df_res1 = pd.DataFrame(data = res1)
# Regression Coefficients
theParams = model.params[0:]
coefs = theParams.to_frame()
df_coefs = pd.DataFrame(coefs.T)
xNames = list(df_coefs)
xValues = list(df_coefs.loc[0].values)
xValues2 = [ '%.2f' % elem for elem in xValues ]
res2 = {'Independent': [xNames],
'beta': [xValues2]}
df_res2 = pd.DataFrame(data = res2)
# All results
df_res = pd.concat([df_res1, df_res2], axis = 1)
df_res = df_res.T
df_res.columns = ['results']
return(df_res)
</code></pre>
<p>下面是一个测试运行:</p>
<pre><code>df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True)
print(df_regression)
</code></pre>
<p>输出:</p>
<pre><code> results
R2 0.3650
X [x1, x2]
Y y
obs 12
p [0.7417691742514285, 0.07989515781898897, 0.25...
start 2017-01-01 00:00:00
stop 2017-01-12 00:00:00
Independent [const, x1, x2]
coefficients [16.29, 0.47, 0.37]
</code></pre>
<p>以下是简单复制粘贴的全部内容:</p>
<pre><code># Imports
import pandas as pd
import numpy as np
import statsmodels.api as sm
np.random.seed(1)
rows = 12
listVars= ['y','x1', 'x2', 'x3']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars)
df = df.set_index(rng)
def LinReg(df, y, x, const):
betas = x.copy()
# Model with out without a constant
if const == True:
x = sm.add_constant(df[x])
model = sm.OLS(df[y], x).fit()
else:
model = sm.OLS(df[y], df[x]).fit()
# Estimates of R2 and p
res1 = {'Y': [y],
'R2': [format(model.rsquared, '.4f')],
'p': [model.pvalues.tolist()],
'start': [df.index[0]],
'stop': [df.index[-1]],
'obs' : [df.shape[0]],
'X': [betas]}
df_res1 = pd.DataFrame(data = res1)
# Regression Coefficients
theParams = model.params[0:]
coefs = theParams.to_frame()
df_coefs = pd.DataFrame(coefs.T)
xNames = list(df_coefs)
xValues = list(df_coefs.loc[0].values)
xValues2 = [ '%.2f' % elem for elem in xValues ]
res2 = {'Independent': [xNames],
'beta': [xValues2]}
df_res2 = pd.DataFrame(data = res2)
# All results
df_res = pd.concat([df_res1, df_res2], axis = 1)
df_res = df_res.T
df_res.columns = ['results']
return(df_res)
df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True)
print(df_regression)
</code></pre>