为什么python中的随机效果与stata不匹配?

2024-05-20 02:31:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在pandas中实现一个随机效应模型,但是我的回归系数与Stata的输出不匹配。我用的是航线和机票价格。下面是我的Python代码:

import pandas as pd
import pandas.stats.plm as plm

airline = pd.read_csv("C:...\Airline.csv")
airline['constant'] = 1.0
airline = airline.set_index(['route', 'time'])
airlinePanel = airline.to_panel()


airlineRE = plm.PanelOLS(y = airlinePanel['lnMktfare'], x=airlinePanel[['constant', 'mktdistance', 'passengers', 'percentAA', 'percentAS',
            'percentDL', 'percentHA', 'percentNK', 'percentUA', 'percentUS', 'percentWN']],
            intercept= True, time_effects=True, dropped_dummies=True, verbose=True)
print airlineRE

和输出:

^{pr2}$

首先,在我进入Stata输出之前,有人知道为什么即使我把intercept = True也没有得到截获项?即使我手动将其添加到回归方程中,Python也会按如下方式估计常数:

-----------------------Summary of Estimated Coefficients------------------------
Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
constant     0.0000        nan        nan        nan        nan        nan

其他的估计都没有改变。现在来看看Stata代码:

import delimited "C:...\Airline.csv", clear
xtset route time
xtreg lnmktfare mktdistance passengers percent*

Stata输出:

Random-effects GLS regression                   Number of obs     =     88,000
Group variable: route                          Number of groups  =      1,000

R-sq:                                           Obs per group:
     within  = 0.2983                                         min =         88
     between = 0.6943                                         avg =       88.0
     overall = 0.3154                                         max =         88

                                                Wald chi2(97)     =   39530.19
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
 lnmktfare   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 mktdistance |   .0002374   1.78e-06   133.40   0.000     .0002339    .0002409
 passengers  |  -.0000382   8.90e-07   -42.91   0.000    -.0000399   -.0000364

 percentAA   |   .1340237   .0058275    23.00   0.000      .122602    .1454454
 percentAS   |   .1159311    .006403    18.11   0.000     .1033815    .1284807
 percentDL   |   .2689447   .0039186    68.63   0.000     .2612644     .276625
 percentHA   |  -.0637648   .1378896    -0.46   0.644    -.3340235    .2064939
 percentNK   |  -.4974099   .0131605   -37.80   0.000     -.523204   -.4716158
 percentUA   |   .1653212   .0055116    30.00   0.000     .1545187    .1761236
 percentUS   |   .1784333   .0046914    38.03   0.000     .1692383    .1876283
 percentWN   |  -.1531444   .0041407   -36.98   0.000    -.1612601   -.1450286
     _cons   |   4.893488    .011821   413.97   0.000     4.870319    4.916657
-------------+----------------------------------------------------------------
   sigma_u   |  .02593863
   sigma_e   |  .36056598
       rho   |  .00514853   (fraction of variance due to u_i)
------------------------------------------------------------------------------

我不知道为什么这两个程序的系数有点偏差,但这是一个很大的差异,我担心熊猫的准确性。我的主要问题是(1)为什么我不从熊猫那里得到截获词?(2)为什么两个包的系数不匹配。注意,我比较了Python和Stata之间的OLS、Logit和IV2SLS模型,结果完全吻合,这让我认为pandas中随机效应模型的实现可能有问题。我在ipython3.0.0中运行python2.7.9和stata14。在


Tags: ofcsv模型importtruepandastimenan
1条回答
网友
1楼 · 发布于 2024-05-20 02:31:12

您的python代码正在执行固定效果。从自由度可以看出,在python输出中,自由度超过1000,Stata输出的自由度小于100。与固定效应不同,随机效应不被视为待估计的参数——它们被假定与X不相关,但具有特定的误差结构,使得RE比集合OLS更有效。在

相关问题 更多 >