statsmodel线性回归（ols）的稳健性问题-Python

from statsmodels.formula.api import ols nbData = 1000 rand1 = np.random.uniform(size=nbData) rand2 = np.random.uniform(size=nbData) a = 1 * (rand1 <= (1.0/3.0)) b = 1 * (((1.0/3.0)< rand1) & (rand1< (4/5.0))) c = 1-b-a d = 1 * (rand2 <= (3.0/5.0)) e = 1-d weigths = [1,2,3,1,2] y = a+2*b+3*c+4*d+5*e df = pd.DataFrame({'y':y, 'a':a, 'b':b, 'c':c, 'd':d, 'e':e}) mod = ols(formula='y ~ a + b + c + d + e - 1', data=df) res = mod.fit() print(res.summary())

OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 1.000 Model: OLS Adj. R-squared: 1.000 Method: Least Squares F-statistic: 1.006e+30 Date: Wed, 16 Sep 2015 Prob (F-statistic): 0.00 Time: 03:05:40 Log-Likelihood: 3156.8 No. Observations: 100 AIC: -6306. Df Residuals: 96 BIC: -6295. Df Model: 3 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ a 1.6000 7.47e-16 2.14e+15 0.000 1.600 1.600 b 2.6000 6.11e-16 4.25e+15 0.000 2.600 2.600 c 3.6000 9.61e-16 3.74e+15 0.000 3.600 3.600 d 3.4000 5.21e-16 6.52e+15 0.000 3.400 3.400 e 4.4000 6.85e-16 6.42e+15 0.000 4.400 4.400 ============================================================================== Omnibus: 11.299 Durbin-Watson: 0.833 Prob(Omnibus): 0.004 Jarque-Bera (JB): 5.720 Skew: -0.381 Prob(JB): 0.0573 Kurtosis: 2.110 Cond. No. 2.46e+15 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The smallest eigenvalue is 1.67e-29. This might indicate that there are strong multicollinearity problems or that the design matrix is singular.

OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.167 Model: OLS Adj. R-squared: 0.161 Method: Least Squares F-statistic: 29.83 Date: Wed, 16 Sep 2015 Prob (F-statistic): 1.23e-22 Time: 03:08:04 Log-Likelihood: -701.02 No. Observations: 600 AIC: 1412. Df Residuals: 595 BIC: 1434. Df Model: 4 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ a 5.8070 1.15e+13 5.05e-13 1.000 -2.26e+13 2.26e+13 b 6.4951 1.15e+13 5.65e-13 1.000 -2.26e+13 2.26e+13 c 6.9033 1.15e+13 6.01e-13 1.000 -2.26e+13 2.26e+13 d -1.1927 1.15e+13 -1.04e-13 1.000 -2.26e+13 2.26e+13 e -0.1685 1.15e+13 -1.47e-14 1.000 -2.26e+13 2.26e+13 ============================================================================== Omnibus: 67.153 Durbin-Watson: 0.328 Prob(Omnibus): 0.000 Jarque-Bera (JB): 70.964 Skew: 0.791 Prob(JB): 3.89e-16 Kurtosis: 2.419 Cond. No. 7.70e+14 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The smallest eigenvalue is 9.25e-28. This might indicate that there are strong multicollinearity problems or that the design matrix is singular.

1条回答

网友

1楼 · 发布于 2024-09-26 18:08:01

正如F先生所提到的，主要的问题是statsmodel OLS在这种情况下似乎不能处理共线pb和Excel/R，但是如果不是为每个a, b, c, d and e定义一个变量X和一个Z，它们可以等于a, b or c和d or e，这样回归就很好了。Ie更新代码：

df['X'] = ['c']*len(df)
df.X[df.b!=0] = 'b'
df.X[df.a!=0] = 'a'
df['Z'] = ['e']*len(df)
df.Z[df.d!=0] = 'd'
mod = ols(formula='y ~ X + Z - 1', data=df)

导致预期结果

                           OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 2.684e+27
Date:                Thu, 17 Sep 2015   Prob (F-statistic):               0.00
Time:                        06:22:43   Log-Likelihood:             2.5096e+06
No. Observations:              100000   AIC:                        -5.019e+06
Df Residuals:                   99996   BIC:                        -5.019e+06
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
X[a]           5.0000   1.85e-14    2.7e+14      0.000         5.000     5.000
X[b]           6.0000   1.62e-14   3.71e+14      0.000         6.000     6.000
X[c]           7.0000   2.31e-14   3.04e+14      0.000         7.000     7.000
Z[T.e]         1.0000   1.97e-14   5.08e+13      0.000         1.000     1.000
==============================================================================
Omnibus:                      145.367   Durbin-Watson:                   1.353
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             9729.487
Skew:                          -0.094   Prob(JB):                         0.00
Kurtosis:                       1.483   Cond. No.                         2.29
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

相关问题更多 >

编程相关推荐

热门问题

热门文章