<h2>TL、DR:</h2>
<p>1)不,除非您显式指定,或者它是估计器的默认<code>.score</code>方法。因为您没有,它默认为<code>DecisionTreeRegressor.score</code>,它返回决定系数,即R^2。可能是负数。在</p>
<p>2)是的,这是个问题。这也解释了为什么你会得到一个负的决定系数。在</p>
<h2>细节:</h2>
<p>您使用的函数如下:</p>
<pre><code>scores = cross_val_score(simple_tree, df.loc[:,'system':'gwno'], df['gdp_growth'], cv=cv)
</code></pre>
<p>所以你没有显式地传递一个“scoring”参数。让我们看看<a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html" rel="nofollow noreferrer">docs</a>:</p>
<blockquote>
<p>scoring : string, callable or None, optional, default: None</p>
<p>A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).</p>
</blockquote>
<p>所以它没有明确说明,但这可能意味着它使用了估计器的默认<code>.score</code>方法。在</p>
<p>为了证实这个假设,让我们深入研究<a href="https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/model_selection/_validation.py#L128" rel="nofollow noreferrer">source code</a>。我们看到最终使用的记分器如下:</p>
^{pr2}$
<p>让我们看看<a href="https://github.com/scikit-learn/scikit-learn/blob/ab93d657eb4268ac20c4db01c48065b5a1bfe80d/sklearn/metrics/scorer.py#L247" rel="nofollow noreferrer">source for ^{<cd4>}</a></p>
<pre><code>has_scoring = scoring is not None
if not hasattr(estimator, 'fit'):
raise TypeError("estimator should be an estimator implementing "
"'fit' method, %r was passed" % estimator)
if isinstance(scoring, six.string_types):
return get_scorer(scoring)
elif has_scoring:
# Heuristic to ensure user has not passed a metric
module = getattr(scoring, '__module__', None)
if hasattr(module, 'startswith') and \
module.startswith('sklearn.metrics.') and \
not module.startswith('sklearn.metrics.scorer') and \
not module.startswith('sklearn.metrics.tests.'):
raise ValueError('scoring value %r looks like it is a metric '
'function rather than a scorer. A scorer should '
'require an estimator as its first parameter. '
'Please use `make_scorer` to convert a metric '
'to a scorer.' % scoring)
return get_scorer(scoring)
elif hasattr(estimator, 'score'):
return _passthrough_scorer
elif allow_none:
return None
else:
raise TypeError(
"If no scoring is specified, the estimator passed should "
"have a 'score' method. The estimator %r does not." % estimator)
</code></pre>
<p>所以请注意,<code>scoring=None</code>已经完成,所以:</p>
<pre><code>has_scoring = scoring is not None
</code></pre>
<p>暗示<code>has_scoring == False</code>。另外,估计器有一个<code>.score</code>属性,所以我们要通过这个分支:</p>
<pre><code>elif hasattr(estimator, 'score'):
return _passthrough_scorer
</code></pre>
<p>这很简单:</p>
<pre><code>def _passthrough_scorer(estimator, *args, **kwargs):
"""Function that wraps estimator.score"""
return estimator.score(*args, **kwargs)
</code></pre>
<p>最后,我们现在知道<code>scorer</code>就是你的估计器默认的<code>score</code>。让我们检查一下<a href="http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor.score" rel="nofollow noreferrer">docs for the estimator</a>,它清楚地表明:</p>
<blockquote>
<p>Returns the coefficient of determination R^2 of the prediction.</p>
<p>The coefficient R^2 is defined as (1 - u/v), where u is the regression
sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual
sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible
score is 1.0 and it can be negative (because the model can be
arbitrarily worse). A constant model that always predicts the expected
value of y, disregarding the input features, would get a R^2 score of
0.0.</p>
</blockquote>
<p>所以看起来你的分数实际上就是决定系数。所以,基本上,R^2为负值,意味着你的模型表现得很差。比我们仅仅预测每个输入的期望值(即平均值)更糟糕。这是有道理的,因为正如你所说:</p>
<blockquote>
<p>I have a small sample of ~40 observations and ~70 variables. Might
this be the problem?</p>
</blockquote>
<p>这是个问题。当你只有40个观测值时,对一个70维的问题空间进行有意义的预测几乎是没有希望的。在</p>