Python sci kit learn（metrics）：r2评分和解释方差评分之间的差异？

2条回答

网友

1楼 · 编辑于 2024-05-17 05:42:32

好，看看这个例子：

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

所以，当平均残数为0时，它们是相同的。根据你的需要选择哪一个家属，也就是说，平均残数假设为0吗？

网友

2楼 · 编辑于 2024-05-17 05:42:32

我发现的大多数答案（包括这里）都强调R²和Explained Variance Score之间的区别，即：平均残差（即误差平均值）。

然而，还有一个重要的问题被抛在后面，那就是：我到底为什么要考虑误差的平均值？

复习：

R²：是测量（最小二乘）线性回归解释的变化量的决定系数。

为了评估y的预测值，您可以从不同的角度看它，如下所示：

方差_实际值×R²_实际值=方差_预测值

因此直观地说，R^{2^{越接近1，实际值和预测值的方差就越大，即相同的价差}}

如前所述，主要的区别是误差平均值；如果我们查看公式，我们会发现这是正确的：

R² = 1 - [(Sum of Squared Residuals / n) / Variance_{y_actual}]

Explained Variance Score = 1 - [Variance_{(Y_predicted - Y_actual)} / Variance_{y_actual}]

其中：

Variance(Y_predicted - Y_actual) = (Sum of Squared Residuals - Mean Error) / n

很明显，唯一的区别是我们从第一个公式中减去了平均误差。。。但是为什么？

当我们将R²得分与解释方差得分进行比较时，我们基本上是在检查平均误差；因此，如果R²=解释方差得分，则意味着：平均误差=零！

平均误差反映了我们估计的趋势，即：有偏v.s无偏估计。

总而言之：

如果您希望使用无偏估计量，以便我们的模型不会低估或高估，您可以考虑将误差平均值考虑在内。

总而言之：

相关问题更多 >

编程相关推荐

热门问题

热门文章