擅长:python、mysql、java
<p>作为TG61591答案的补充,如果您深入研究<a href="https://github.com/scikit-learn/scikit-learn/blob/0fb307bf3/sklearn/ensemble/_gb.py#L771" rel="nofollow noreferrer">code</a>,您可以找到一条附加注释,该注释添加了一些关于<code>max_features</code>超参数在模型中如何工作的有用信息:</p>
<blockquote>
<p>Notes
-
The features are always randomly permuted at each split. Therefore,
the best found split may vary, even with the same training data and
<code>max_features=n_features</code>, if the improvement of the criterion is
identical for several splits enumerated during the search of the best
split. To obtain a deterministic behaviour during fitting,
<code>random_state</code> has to be fixed.</p>
</blockquote>
<p>此外,关键概念是随机森林需要增加随机性,以减少模型的方差(尽管这可能会导致更高的偏差),正如本最后说明中所警告的:</p>
<blockquote>
<p>Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.</p>
</blockquote>