为什么OLS回归的“sklearn”和“statsmodels”实现给出了不同的R^2？

import numpy as np import sklearn import statsmodels import sklearn.linear_model as sl import statsmodels.api as sm np.random.seed(42) N=1000 X = np.random.normal(loc=1, size=(N, 1)) Y = 2 * X.flatten() + 4 + np.random.normal(size=N) sklernIntercept=sl.LinearRegression(fit_intercept=True).fit(X, Y) sklernNoIntercept=sl.LinearRegression(fit_intercept=False).fit(X, Y) statsmodelsIntercept = sm.OLS(Y, sm.add_constant(X)) statsmodelsNoIntercept = sm.OLS(Y, X) print(sklernIntercept.score(X, Y), statsmodelsIntercept.fit().rsquared) print(sklernNoIntercept.score(X, Y), statsmodelsNoIntercept.fit().rsquared) print(sklearn.__version__, statsmodels.__version__)

1条回答

网友

1楼 · 发布于 2024-10-01 15:40:25

正如@user333700在评论中指出的，R^2的OLS定义在statsmodels'实现中与scikit-learn中不同

来自documentation of ^{} class（重点是我的）：

rsquared
R-squared of a model with an intercept. This is defined here as 1 - ssr/centered_tss if the constant is included in the model and 1 - ssr/uncentered_tss if the constant is omitted.

从documentation of ^{}：

score(X, y, sample_weight=None)
Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the residual
sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

相关问题更多 >

编程相关推荐

热门问题

热门文章