带决策树回归模型的负交叉值得分

2条回答

网友

1楼 · 编辑于 2024-05-09 18:36:49

这是可能发生的。已经在这个post中回答了！在

实际的MSE只是你得到的数字的正数。在

统一计分API总是使分数最大化，因此需要最小化的分数被否定，以便统一计分API正确工作。因此，当返回的分数是应最小化的分数时，它将被否定；如果该分数应该最大化，则返回的分数为正。在

网友

2楼 · 编辑于 2024-05-09 18:36:49

TL、DR：

1）不，除非您显式指定，或者它是估计器的默认.score方法。因为您没有，它默认为DecisionTreeRegressor.score，它返回决定系数，即R^2。可能是负数。在

2）是的，这是个问题。这也解释了为什么你会得到一个负的决定系数。在

细节：

您使用的函数如下：

scores = cross_val_score(simple_tree, df.loc[:,'system':'gwno'], df['gdp_growth'], cv=cv)

所以你没有显式地传递一个“scoring”参数。让我们看看docs：

scoring : string, callable or None, optional, default: None
A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).

所以它没有明确说明，但这可能意味着它使用了估计器的默认.score方法。在

为了证实这个假设，让我们深入研究source code。我们看到最终使用的记分器如下：

^{pr2}$

让我们看看source for ^{}

has_scoring = scoring is not None
if not hasattr(estimator, 'fit'):
    raise TypeError("estimator should be an estimator implementing "
                    "'fit' method, %r was passed" % estimator)
if isinstance(scoring, six.string_types):
    return get_scorer(scoring)
elif has_scoring:
    # Heuristic to ensure user has not passed a metric
    module = getattr(scoring, '__module__', None)
    if hasattr(module, 'startswith') and \
       module.startswith('sklearn.metrics.') and \
       not module.startswith('sklearn.metrics.scorer') and \
       not module.startswith('sklearn.metrics.tests.'):
        raise ValueError('scoring value %r looks like it is a metric '
                         'function rather than a scorer. A scorer should '
                         'require an estimator as its first parameter. '
                         'Please use `make_scorer` to convert a metric '
                         'to a scorer.' % scoring)
    return get_scorer(scoring)
elif hasattr(estimator, 'score'):
    return _passthrough_scorer
elif allow_none:
    return None
else:
    raise TypeError(
        "If no scoring is specified, the estimator passed should "
        "have a 'score' method. The estimator %r does not." % estimator)

所以请注意，scoring=None已经完成，所以：

has_scoring = scoring is not None

暗示has_scoring == False。另外，估计器有一个.score属性，所以我们要通过这个分支：

elif hasattr(estimator, 'score'):
    return _passthrough_scorer

这很简单：

def _passthrough_scorer(estimator, *args, **kwargs):
    """Function that wraps estimator.score"""
    return estimator.score(*args, **kwargs)

最后，我们现在知道scorer就是你的估计器默认的score。让我们检查一下docs for the estimator，它清楚地表明：

Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

所以看起来你的分数实际上就是决定系数。所以，基本上，R^2为负值，意味着你的模型表现得很差。比我们仅仅预测每个输入的期望值（即平均值）更糟糕。这是有道理的，因为正如你所说：

I have a small sample of ~40 observations and ~70 variables. Might this be the problem?

这是个问题。当你只有40个观测值时，对一个70维的问题空间进行有意义的预测几乎是没有希望的。在

TL、DR：

细节：

相关问题更多 >

编程相关推荐

热门问题

热门文章