Python中面板数据的多重共线性

# Load the data (requires statsmodels and linearmodels) import statsmodels.api as sm from linearmodels.datasets import wage_panel import pandas as pd data = wage_panel.load() year = pd.Categorical(data.year) data = data.set_index(['nr', 'year']) data['year'] = year print(wage_panel.DESCR) print(data.head()) # Run the panel regression from linearmodels.panel import PanelOLS exog_vars = ['exper','union','married'] exog = sm.add_constant(data[exog_vars]) mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True) fe_te_res = mod.fit() print(fe_te_res)

> . xtreg lwage exper union married i.year, fe note: 1987.year omitted because of collinearity Fixed-effects (within) regression Number of obs = 4360 Group variable: nr Number of groups = 545 R-sq: within = 0.1689 Obs per group: min = 8 between = 0.0000 avg = 8.0 overall = 0.0486 max = 8 F(9,3806) = 85.95 corr(u_i, Xb) = -0.1747 Prob > F = 0.0000 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- exper | .0638624 .0032594 19.59 0.000 .0574721 .0702527 union | .0833697 .0194393 4.29 0.000 .0452572 .1214821 married | .0583372 .0183688 3.18 0.002 .0223235 .0943509 | year | 1981 | .0496865 .0200714 2.48 0.013 .0103348 .0890382 1982 | .0399445 .019123 2.09 0.037 .0024521 .0774369 1983 | .0193513 .018662 1.04 0.300 -.0172373 .0559398 1984 | .0229574 .0186503 1.23 0.218 -.0136081 .0595229 1985 | .0081499 .0191359 0.43 0.670 -.0293677 .0456674 1986 | .0036329 .0200851 0.18 0.856 -.0357456 .0430115 1987 | 0 (omitted) | _cons | 1.169184 .0231221 50.57 0.000 1.123851 1.214517 -------------+---------------------------------------------------------------- sigma_u | .40761229 sigma_e | .35343397 rho | .57083029 (fraction of variance due to u_i) ------------------------------------------------------------------------------

3条回答

网友

1楼 · 编辑于 2024-06-26 13:51:06

我唯一能想到的办法就是手动。在

样本数据

import pandas as pd
import numpy as np
import statsmodels.api as sm

from linearmodels.datasets import wage_panel
from linearmodels.panel import PanelOLS

data = wage_panel.load()

首先，我们将跟随Stata的脚步，为每年的固定效果生成一个假人，我们省略了第一个值，按字典顺序排序（用drop_first=True参数完成）。重要的是使用np.unique来获得标签，因为这也是排序。无需statsmodels添加常量，只需自己添加：

^{pr2}$

如果我们试图运行这个模型，它会因为完全共线而失败。因为我们在内进行回归分析，我们不能仅仅在exog上测试共线性，我们需要首先在面板内进行demean，因为demeaned自变量的共线性才是最重要的。我将在这里复制一份，以免弄乱我们将在最终回归中使用的exog。在

exog2 = exog.copy()
exog2 = exog2 - exog2.groupby(level=0).transform('mean') + exog2.mean()

我们现在可以看到有完美的共线性；当我们将一个内的平均外显变量与其他变量进行回归时，某些回归得到了一个完美的R平方：

for col in exog2:
    print(col, sm.OLS(exog2[col], exog2.drop(columns=col)).fit().rsquared)

#exper 1.0
#union 0.004326532427527674
#married 0.18901677578724896
#1981 1.0
#1982 1.0
#1983 1.0
#1984 1.0
#1985 1.0
#1986 1.0
#1987 1.0

现在Stata或其他软件包如何决定删除哪个变量对我来说是个谜。在这种情况下，你可能会倾向于放弃一个你的年度假人而不是其他变量。让我们选1987年，这样我们就能得到和斯塔塔一样的结果。在

exog2 = exog2.drop(columns=1987)
for col in exog2:
    print(col, sm.OLS(exog2[col], exog2.drop(columns=col)).fit().rsquared)

#exper 0.48631
#union 0.0043265
#married 0.1890
#1981 0.34978
#1982 0.28369
#1983 0.2478680
#1984 0.2469180
#1985 0.2846552
#1986 0.35067075
#constant 0.94641

所以我们已经处理了共线，可以回到模型。由于我们已经手动包含了年度固定效应，所以我们可以从模型中删除time_effects。在

mod = PanelOLS(data.lwage, exog.drop(columns=1987), entity_effects=True, time_effects=False)
print(mod.fit())

                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                  lwage   R-squared:                        0.1689
Estimator:                   PanelOLS   R-squared (Between):             -0.0882
No. Observations:                4360   R-squared (Within):               0.1689
Date:                Sat, Mar 09 2019   R-squared (Overall):              0.0308
Time:                        00:59:14   Log-likelihood                   -1355.7
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      85.946
Entities:                         545   P-value                           0.0000
Avg Obs:                       8.0000   Distribution:                  F(9,3806)
Min Obs:                       8.0000                                           
Max Obs:                       8.0000   F-statistic (robust):             85.946
                                        P-value                           0.0000
Time periods:                       8   Distribution:                  F(9,3806)
Avg Obs:                       545.00                                           
Min Obs:                       545.00                                           
Max Obs:                       545.00                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
                                       
exper          0.0639     0.0033     19.593     0.0000      0.0575      0.0703
union          0.0834     0.0194     4.2887     0.0000      0.0453      0.1215
married        0.0583     0.0184     3.1759     0.0015      0.0223      0.0944
1981           0.0497     0.0201     2.4755     0.0133      0.0103      0.0890
1982           0.0399     0.0191     2.0888     0.0368      0.0025      0.0774
1983           0.0194     0.0187     1.0369     0.2998     -0.0172      0.0559
1984           0.0230     0.0187     1.2309     0.2184     -0.0136      0.0595
1985           0.0081     0.0191     0.4259     0.6702     -0.0294      0.0457
1986           0.0036     0.0201     0.1809     0.8565     -0.0357      0.0430
constant       1.1692     0.0231     50.566     0.0000      1.1239      1.2145
==============================================================================

与Stata输出相匹配。实际上没有任何理由放弃1987年，我们可以选择其他的东西，但至少有了这个，我们可以看到结果与xtreg相匹配。在

样本数据

相关问题更多 >

编程相关推荐

热门问题

热门文章