带决策树回归模型的负交叉值得分问题的回答

带决策树回归模型的负交叉值得分

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>TL、DR：</h2> 1）不，除非您显式指定，或者它是估计器的默认<code>.score</code>方法。因为您没有，它默认为<code>DecisionTreeRegressor.score</code>，它返回决定系数，即R^2。可能是负数。在 2）是的，这是个问题。这也解释了为什么你会得到一个负的决定系数。在 <h2>细节：</h2> 您使用的函数如下： <pre><code>scores = cross_val_score(simple_tree, df.loc[:,'system':'gwno'], df['gdp_growth'], cv=cv) </code></pre> 所以你没有显式地传递一个“scoring”参数。让我们看看<a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html" rel="nofollow noreferrer">docs</a>： <blockquote> scoring : string, callable or None, optional, default: None A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). </blockquote> 所以它没有明确说明，但这可能意味着它使用了估计器的默认<code>.score</code>方法。在 为了证实这个假设，让我们深入研究<a href="https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/model_selection/_validation.py#L128" rel="nofollow noreferrer">source code</a>。我们看到最终使用的记分器如下： ^{pr2}$ 让我们看看<a href="https://github.com/scikit-learn/scikit-learn/blob/ab93d657eb4268ac20c4db01c48065b5a1bfe80d/sklearn/metrics/scorer.py#L247" rel="nofollow noreferrer">source for ^{<cd4>}</a> <pre><code>has_scoring = scoring is not None if not hasattr(estimator, 'fit'): raise TypeError("estimator should be an estimator implementing " "'fit' method, %r was passed" % estimator) if isinstance(scoring, six.string_types): return get_scorer(scoring) elif has_scoring: # Heuristic to ensure user has not passed a metric module = getattr(scoring, '__module__', None) if hasattr(module, 'startswith') and \ module.startswith('sklearn.metrics.') and \ not module.startswith('sklearn.metrics.scorer') and \ not module.startswith('sklearn.metrics.tests.'): raise ValueError('scoring value %r looks like it is a metric ' 'function rather than a scorer. A scorer should ' 'require an estimator as its first parameter. ' 'Please use `make_scorer` to convert a metric ' 'to a scorer.' % scoring) return get_scorer(scoring) elif hasattr(estimator, 'score'): return _passthrough_scorer elif allow_none: return None else: raise TypeError( "If no scoring is specified, the estimator passed should " "have a 'score' method. The estimator %r does not." % estimator) </code></pre> 所以请注意，<code>scoring=None</code>已经完成，所以： <pre><code>has_scoring = scoring is not None </code></pre> 暗示<code>has_scoring == False</code>。另外，估计器有一个<code>.score</code>属性，所以我们要通过这个分支： <pre><code>elif hasattr(estimator, 'score'): return _passthrough_scorer </code></pre> 这很简单： <pre><code>def _passthrough_scorer(estimator, *args, **kwargs): """Function that wraps estimator.score""" return estimator.score(*args, **kwargs) </code></pre> 最后，我们现在知道<code>scorer</code>就是你的估计器默认的<code>score</code>。让我们检查一下<a href="http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor.score" rel="nofollow noreferrer">docs for the estimator</a>，它清楚地表明： <blockquote> Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. </blockquote> 所以看起来你的分数实际上就是决定系数。所以，基本上，R^2为负值，意味着你的模型表现得很差。比我们仅仅预测每个输入的期望值（即平均值）更糟糕。这是有道理的，因为正如你所说： <blockquote> I have a small sample of ~40 observations and ~70 variables. Might this be the problem? </blockquote> 这是个问题。当你只有40个观测值时，对一个70维的问题空间进行有意义的预测几乎是没有希望的。在

带决策树回归模型的负交叉值得分

1 个回答

相关Python问题