“max_features”如何限制sklearn集成模型中的特征数量？

2条回答

网友

1楼 · 编辑于 2024-10-08 16:40:19

However, it seems that the maximum number of features resets at each node. That is, for any given node, the estimator randomly selects 10 features, chooses the best one, splits the node and repeats the process for all subsequent nodes.

你在这里的理解是正确的。梯度提升和随机森林的工作方式是，在每个树的每个分割处，他们将随机选择max_features（在文献中，该参数称为mtry）进行评估。这是一种机制，通过该机制，模型通过在每次分割时不评估每个特征而在模型之间引入随机性

网友
2楼 · 编辑于 2024-10-08 16:40:19

作为TG61591答案的补充，如果您深入研究code，您可以找到一条附加注释，该注释添加了一些关于max_features超参数在模型中如何工作的有用信息：
Notes - The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.
此外，关键概念是随机森林需要增加随机性，以减少模型的方差（尽管这可能会导致更高的偏差），正如本最后说明中所警告的：
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

相关问题更多 >

编程相关推荐

热门问题

热门文章