我试图在sklearn中实现R的随机森林回归模型的特征重要性评分方法;根据R的文档:
The first measure is computed from permuting OOB data: For each tree, the prediction error on the out-of-bag portion of the data is recorded (error rate for classification, MSE for regression). Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences. If the standard deviation of the differences is equal to 0 for a variable, the division is not done (but the average is almost always equal to 0 in that case).
因此,如果我理解正确,我需要能够为每个树中的OOB样本置换每个预测变量(特征)。在
我知道我可以用这样的方法访问经过训练的森林中的每一棵树
numberTrees = 100
clf = RandomForestRegressor(n_estimators=numberTrees)
clf.fit(X,Y)
for tree in clf.estimators_:
do something
有没有得到每棵树的OOB样本列表?也许我可以使用每棵树的random_state
来导出OOB示例列表?在
虽然R使用OOB样本,但我发现通过使用所有的训练样本,我在scikit中得到了类似的结果。我正在做以下工作:
相关问题 更多 >
编程相关推荐