<p>可以将sample weights参数传递给随机林<a href="http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.fit">fit method</a></p>
<pre><code>sample_weight : array-like, shape = [n_samples] or None
</code></pre>
<blockquote>
<p>Sample weights. If None, then samples are equally weighted. Splits
that would create child nodes with net zero or negative weight are
ignored while searching for a split in each node. In the case of
classification, splits are also ignored if they would result in any
single class carrying a negative weight in either child node.</p>
</blockquote>
<p>在旧版本中,有一个<code>preprocessing.balance_weights</code>方法来为给定的样本生成平衡权重,这样类就变得均匀分布。它仍然存在于内部但仍然可用的<a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/_weights.py">preprocessing._weights</a>模块中,但已弃用,并将在以后的版本中删除。不知道具体原因。</p>
<p><strong>更新</strong></p>
<p>一些澄清,因为你似乎很困惑。<code>sample_weight</code>一旦您记住它的目的是平衡训练数据集中的目标类,那么它的用法就很简单了。也就是说,如果有<code>X</code>作为观察值和<code>y</code>作为类(标签),那么<code>len(X) == len(y) == len(sample_wight)</code>,并且<code>sample witght</code>1-d数组的每个元素表示对应的<code>(observation, label)</code>对的权重。对于您的情况,如果<code>1</code>类被表示为<code>0</code>类的5倍,并且您平衡了类分布,则可以使用</p>
<pre><code>sample_weight = np.array([5 if i == 0 else 1 for i in y])
</code></pre>
<p>将<code>5</code>的权重分配给所有<code>0</code>实例,将<code>1</code>的权重分配给所有<code>1</code>实例。请参阅上面的链接,了解更巧妙的权重计算函数。</p>