我试图在pandas中实现一个随机效应模型,但是我的回归系数与Stata的输出不匹配。我用的是航线和机票价格。下面是我的Python代码:
import pandas as pd
import pandas.stats.plm as plm
airline = pd.read_csv("C:...\Airline.csv")
airline['constant'] = 1.0
airline = airline.set_index(['route', 'time'])
airlinePanel = airline.to_panel()
airlineRE = plm.PanelOLS(y = airlinePanel['lnMktfare'], x=airlinePanel[['constant', 'mktdistance', 'passengers', 'percentAA', 'percentAS',
'percentDL', 'percentHA', 'percentNK', 'percentUA', 'percentUS', 'percentWN']],
intercept= True, time_effects=True, dropped_dummies=True, verbose=True)
print airlineRE
和输出:
^{pr2}$首先,在我进入Stata输出之前,有人知道为什么即使我把intercept = True
也没有得到截获项?即使我手动将其添加到回归方程中,Python也会按如下方式估计常数:
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
constant 0.0000 nan nan nan nan nan
其他的估计都没有改变。现在来看看Stata代码:
import delimited "C:...\Airline.csv", clear
xtset route time
xtreg lnmktfare mktdistance passengers percent*
Stata输出:
Random-effects GLS regression Number of obs = 88,000
Group variable: route Number of groups = 1,000
R-sq: Obs per group:
within = 0.2983 min = 88
between = 0.6943 avg = 88.0
overall = 0.3154 max = 88
Wald chi2(97) = 39530.19
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
lnmktfare | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mktdistance | .0002374 1.78e-06 133.40 0.000 .0002339 .0002409
passengers | -.0000382 8.90e-07 -42.91 0.000 -.0000399 -.0000364
percentAA | .1340237 .0058275 23.00 0.000 .122602 .1454454
percentAS | .1159311 .006403 18.11 0.000 .1033815 .1284807
percentDL | .2689447 .0039186 68.63 0.000 .2612644 .276625
percentHA | -.0637648 .1378896 -0.46 0.644 -.3340235 .2064939
percentNK | -.4974099 .0131605 -37.80 0.000 -.523204 -.4716158
percentUA | .1653212 .0055116 30.00 0.000 .1545187 .1761236
percentUS | .1784333 .0046914 38.03 0.000 .1692383 .1876283
percentWN | -.1531444 .0041407 -36.98 0.000 -.1612601 -.1450286
_cons | 4.893488 .011821 413.97 0.000 4.870319 4.916657
-------------+----------------------------------------------------------------
sigma_u | .02593863
sigma_e | .36056598
rho | .00514853 (fraction of variance due to u_i)
------------------------------------------------------------------------------
我不知道为什么这两个程序的系数有点偏差,但这是一个很大的差异,我担心熊猫的准确性。我的主要问题是(1)为什么我不从熊猫那里得到截获词?(2)为什么两个包的系数不匹配。注意,我比较了Python和Stata之间的OLS、Logit和IV2SLS模型,结果完全吻合,这让我认为pandas中随机效应模型的实现可能有问题。我在ipython3.0.0中运行python2.7.9和stata14。在
您的python代码正在执行固定效果。从自由度可以看出,在python输出中,自由度超过1000,Stata输出的自由度小于100。与固定效应不同,随机效应不被视为待估计的参数——它们被假定与X不相关,但具有特定的误差结构,使得RE比集合OLS更有效。在
相关问题 更多 >
编程相关推荐