Python P值中的因果影响分析似乎不正确

2024-09-26 22:50:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在用Python进行因果影响分析,这有助于测量干预后治疗组与对照组的影响(a/B测试)。为了开始使用Python,我参考了https://github.com/jamalsenouci/causalimpact/blob/master/GettingStarted.ipynb

假设我的数据采用以下格式:

enter image description here

以Prime1为治疗,周期为对照

以下代码可以完美地工作:

from causalimpact import CausalImpact
pre_period = [pd.to_datetime(date) for date in  [start_date,cut_date_1]]
post_period = [pd.to_datetime(date) for date in [cut_date_2,end_date]]
impact = CausalImpact(df_AA.loc[start_date:end_date_AA], pre_period, post_period, model_args={"nseasons":7})
impact.run()
impact.plot()

我得到了下面两张图,似乎运动在统计上并不显著,因为预测值的置信区间与实际值重叠

enter image description here

然而,我想最后回答运动是否具有统计学意义,治疗组和对照组之间的p值是多少?为此我用了

print(impact.summary())
print(impact.summary("report"))

我得到的结果如下。它表示p值为0.0,并且有正的运动。这似乎不正确。我尝试了不同的数据,其中实际和预测的差异非常大,它们不是预测的CI,也不是与实际重叠的CI,我仍然得到p值为0。由此计算的p值似乎不正确。是否有任何指针可以自行计算此因果影响库的p值,或者是否有方法修复此库

                              Average     Cumulative
Actual                             15            247
Predicted                          15            246
95% CI                       [15, 15]     [244, 249]
                                                    
Absolute Effect                     0              1
95% CI                         [0, 0]        [3, -1]
                                                    
Relative Effect                  0.4%           0.4%
95% CI                  [1.5%, -0.6%]  [1.5%, -0.6%]
                                                    
P-value                          0.0%               
Prob. of Causal Effect         100.0%               
None
 During the post-intervention period, the response variable had an average value of approx. 15.  By contrast, in  the
absence of an intervention, we would have expected an average response of 15. The 90% interval of this counterfactual
prediction is [15, 15]. Subtracting this prediction from the observed response yields an estimate of the causal effect
the intervention had on the response variable. This effect is 0 with a 90% interval of [0, 0]. For a discussion of the
significance of this effect, see below.


 Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully
interpreted), the response variable had an overall value of 247.  By contrast, had  the intervention not taken place, we
would have expected a sum of 247. The 90% interval of this prediction is [244, 249]


 The above results are given in terms of absolute numbers. In relative terms, the response variable showed  an increase
of  0.4%. The 90% interval of this percentage is [1.5%, -0.6%]


 This means that the positive effect observed during the intervention period is statistically significant and unlikely
to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears
substantive significance can only be answered by comparing the absolute effect 0 to the original goal of the underlying
intervention.
None

Tags: ofthetoinciandateis

热门问题