<p>错误的解决方案:</p>
<pre><code>ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
</code></pre>
<p>该错误指出<code>target</code>变量中有一个类只出现一次。为了解释这一点,让我们考虑下面的例子:</P>
<pre><code>random_list = ['a','a','a','b','b','c','d','d','e','e','e']
LE = LabelEncoder()
target = LE.fit_transform(random_list)
print(target)
</code></pre>
<p>给予</p>
<pre><code>array([0, 0, 0, 1, 1, 2, 3, 3, 4, 4, 4])
</code></pre>
<p>现在,如果我尝试执行<code>train_test_split</code>,这将抛出一个错误</p>
<pre><code>train_test_split(target, test_size=0.2, stratify=target)
#ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
</code></pre>
<p>这是因为我只出现了一次<code>'c'</code>,这造成了在<code>stratify=True</code>时是否将其放入训练或测试的模糊性。因此,为了让它发挥作用,我们需要在每个类中出现一次以上的事件</p>
<p><strong>以上示例的附加错误</strong></p>
<p>即使我从上面的列表中删除<code>'c'</code>,上面的解决方案也不起作用。我们遇到了另一个错误</p>
<pre><code>random_list = ['a','a','a','b','b','d','d','e','e','e']
E = LabelEncoder()
target = LE.fit_transform(random_list) #produces array([0, 0, 0, 1, 1, 3, 3, 4, 4, 4])
train_test_split(target, test_size=0.2, stratify=target)
#ValueError: The test_size = 2 should be greater or equal to the number of classes = 4
</code></pre>
<p>为了使分层成功工作,您需要在训练和测试中都出现所有类。如果数据点的数量不足以创建适当的分布,则抛出上述错误。对于<code>test_size=2</code>,最多可以分层2个类</p>