如果虚拟变量的 t 分数为0.95,分类变量是否相关?

2024-10-02 02:25:42 发布

您现在位置:Python中文网/ 问答频道 /正文

如果变量得分超过0.05 t,则认为该变量不相关,应将其从模型中排除。但是,如果分类变量有4个伪变量,并且其中只有一个超过0.05,该怎么办?我是否排除了整个分类变量

                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.803
Model:                            OLS   Adj. R-squared:                  0.801
Method:                 Least Squares   F-statistic:                     368.4
Date:                Mon, 15 Jul 2019   Prob (F-statistic):               0.00
Time:                        12:00:26   Log-Likelihood:                -17357.
No. Observations:                1460   AIC:                         3.475e+04
Df Residuals:                    1443   BIC:                         3.484e+04
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
============================================================================================
                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const                    -1.366e+05   9432.229    -14.482      0.000   -1.55e+05   -1.18e+05
OverallQual               1.327e+04   1249.192     10.622      0.000    1.08e+04    1.57e+04
ExterQual                 1.168e+04   2763.188      4.228      0.000    6262.969    1.71e+04
TotalBsmtSF                 13.7198      5.182      2.648      0.008       3.554      23.885
GrLivArea                   45.4098      2.521     18.012      0.000      40.465      50.355
1stFlrSF                     9.4573      5.543      1.706      0.088      -1.416      20.330
GarageArea                  22.4791      9.748      2.306      0.021       3.358      41.600
KitchenQual               1.309e+04   2142.662      6.111      0.000    8891.243    1.73e+04
GarageCars                8875.8202   2961.291      2.997      0.003    3066.923    1.47e+04
BsmtQual                  1.097e+04   2094.395      5.235      0.000    6856.671    1.51e+04
GarageFinish_No           2689.1356   5847.186      0.460      0.646   -8780.759    1.42e+04
GarageFinish_RFn         -8223.4503   2639.360     -3.116      0.002   -1.34e+04   -3046.057
GarageFinish_Unf         -8416.9443   2928.002     -2.875      0.004   -1.42e+04   -2673.349
BsmtExposure_Gd           2.298e+04   3970.691      5.788      0.000    1.52e+04    3.08e+04
BsmtExposure_Mn           -262.8498   4160.294     -0.063      0.950   -8423.721    7898.021
BsmtExposure_No          -7690.0994   2800.731     -2.746      0.006   -1.32e+04   -2196.159
BsmtExposure_No Basement  2.598e+04   9879.662      2.630      0.009    6598.642    4.54e+04
==============================================================================
Omnibus:                      614.604   Durbin-Watson:                   1.972
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            76480.899
Skew:                          -0.928   Prob(JB):                         0.00
Kurtosis:                      38.409   Cond. No.                     2.85e+04
==============================================================================

Tags: no模型dfmodel分类statisticresultsomnibus
1条回答
网友
1楼 · 发布于 2024-10-02 02:25:42

当你说“0.05 t分数”时,我想你的意思是“0.05 p值”。t值仅为coef / stderr,进入p值计算(abs(t_value) > 2约为p值<;(0.05)

当你说“分类变量有4个伪变量”时,我想你的意思是它有4个“级别”/不同的值,你指的是BsmtExposure_Mn。我会把它留在这里,因为其他类别/级别都在帮助这个模型。如果您有几个预测性较差的类别,您可以考虑将它们合并到一个“其他”类别中

一般来说,您不应该自动排除变量,因为它们的p值为>;0.05(或无论您的截止值/α值是多少)。它们有助于理解模型中发生的事情,并向其他人解释结果

相关问题 更多 >

    热门问题