如果变量得分超过0.05 t,则认为该变量不相关,应将其从模型中排除。但是,如果分类变量有4个伪变量,并且其中只有一个超过0.05,该怎么办?我是否排除了整个分类变量
OLS Regression Results
==============================================================================
Dep. Variable: SalePrice R-squared: 0.803
Model: OLS Adj. R-squared: 0.801
Method: Least Squares F-statistic: 368.4
Date: Mon, 15 Jul 2019 Prob (F-statistic): 0.00
Time: 12:00:26 Log-Likelihood: -17357.
No. Observations: 1460 AIC: 3.475e+04
Df Residuals: 1443 BIC: 3.484e+04
Df Model: 16
Covariance Type: nonrobust
============================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------------
const -1.366e+05 9432.229 -14.482 0.000 -1.55e+05 -1.18e+05
OverallQual 1.327e+04 1249.192 10.622 0.000 1.08e+04 1.57e+04
ExterQual 1.168e+04 2763.188 4.228 0.000 6262.969 1.71e+04
TotalBsmtSF 13.7198 5.182 2.648 0.008 3.554 23.885
GrLivArea 45.4098 2.521 18.012 0.000 40.465 50.355
1stFlrSF 9.4573 5.543 1.706 0.088 -1.416 20.330
GarageArea 22.4791 9.748 2.306 0.021 3.358 41.600
KitchenQual 1.309e+04 2142.662 6.111 0.000 8891.243 1.73e+04
GarageCars 8875.8202 2961.291 2.997 0.003 3066.923 1.47e+04
BsmtQual 1.097e+04 2094.395 5.235 0.000 6856.671 1.51e+04
GarageFinish_No 2689.1356 5847.186 0.460 0.646 -8780.759 1.42e+04
GarageFinish_RFn -8223.4503 2639.360 -3.116 0.002 -1.34e+04 -3046.057
GarageFinish_Unf -8416.9443 2928.002 -2.875 0.004 -1.42e+04 -2673.349
BsmtExposure_Gd 2.298e+04 3970.691 5.788 0.000 1.52e+04 3.08e+04
BsmtExposure_Mn -262.8498 4160.294 -0.063 0.950 -8423.721 7898.021
BsmtExposure_No -7690.0994 2800.731 -2.746 0.006 -1.32e+04 -2196.159
BsmtExposure_No Basement 2.598e+04 9879.662 2.630 0.009 6598.642 4.54e+04
==============================================================================
Omnibus: 614.604 Durbin-Watson: 1.972
Prob(Omnibus): 0.000 Jarque-Bera (JB): 76480.899
Skew: -0.928 Prob(JB): 0.00
Kurtosis: 38.409 Cond. No. 2.85e+04
==============================================================================
当你说“0.05 t分数”时,我想你的意思是“0.05 p值”。t值仅为
coef / stderr
,进入p值计算(abs(t_value) > 2
约为p值<;(0.05)当你说“分类变量有4个伪变量”时,我想你的意思是它有4个“级别”/不同的值,你指的是
BsmtExposure_Mn
。我会把它留在这里,因为其他类别/级别都在帮助这个模型。如果您有几个预测性较差的类别,您可以考虑将它们合并到一个“其他”类别中一般来说,您不应该自动排除变量,因为它们的p值为>;0.05(或无论您的截止值/α值是多少)。它们有助于理解模型中发生的事情,并向其他人解释结果
相关问题 更多 >
编程相关推荐