<p>分数本身在<code>BaseForest</code>类的<a href="https://github.com/scikit-learn/scikit-learn/blob/e070b7451707e6bd1c1119c64bccd34df31fffe6/sklearn/ensemble/forest.py#L354" rel="nofollow noreferrer">feature_importances_</a>中计算。它们的计算公式为</p>
<pre><code>np.mean(all_importances, axis=0, dtype=np.float64) / np.sum(all_importances)
</code></pre>
<p>其中<code>all_importances</code>是<a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html" rel="nofollow noreferrer">^{<cd4>}</a>估计量的<code>feature_importances_</code>数组。估计量的个数由的参数<code>n_estimators</code>定义
<code>ExtraTreesClassifier</code>。默认情况下有10个估计器(n_估计器的默认值将从版本<code>0.20</code>中的10更改为版本<code>0.22</code>中的100):</p>
^{pr2}$
<p>所以,<code>all_importances</code>看起来像</p>
<pre><code>[x.feature_importances_ for x in est]
Out[59]:
[array([0., 0., 1.]),
array([0., 0., 1.]),
array([0., 0., 1.]),
array([0.33333333, 0. , 0.66666667]),
array([0.11111111, 0.88888889, 0. ]),
array([0., 1., 0.]),
array([0., 0., 1.]),
array([0., 1., 0.]),
array([0., 0., 1.]),
array([0.33333333, 0.66666667, 0. ])]
</code></pre>
<p>每个估计器的<code>feature_importances_</code>是通过在Cython上编写的<a href="https://github.com/scikit-learn/scikit-learn/blob/e070b7451707e6bd1c1119c64bccd34df31fffe6/sklearn/tree/_tree.pyx#L506" rel="nofollow noreferrer">^{<cd12>}</a>类的<a href="https://github.com/scikit-learn/scikit-learn/blob/e070b7451707e6bd1c1119c64bccd34df31fffe6/sklearn/tree/_tree.pyx#L1053" rel="nofollow noreferrer">^{<cd11>}</a>方法计算的。它通过迭代树节点的每个节点来计算,并添加到相应的特性:</p>
<pre><code>feature_importances_[node.feature] += node.weighted_n_node_samples * node.impurity -
left.weighted_n_node_samples * left.impurity -
right.weighted_n_node_samples * right.impurity
</code></pre>
<p>其中<code>weighted_n_node_samples</code>和<code>impurity</code>是具有节点参数的数组:</p>
<pre><code>est[0].tree_.feature
Out[60]: array([ 2, 2, -2, -2, -2], dtype=int64)
est[0].tree_.weighted_n_node_samples
Out[61]: array([4., 2., 1., 1., 2.])
est[0].tree_.impurity
Out[62]: array([0.375, 0.5 , 0. , 0. , 0. ])
</code></pre>
<p><code>feature_importances_</code>在计算后被规范化。您可以通过使用参数<code>normalize=False</code>调用<code>compute_feature_importances</code>来查看原始值:</p>
<pre><code>est[3].tree_.compute_feature_importances(normalize=False)
Out[63]: array([0.125, 0. , 0.25 ])
est[3].tree_.compute_feature_importances()
Out[64]: array([0.33333333, 0. , 0.66666667])
</code></pre>