scikit学习错误：y中填充最少的类只有1个memb

2024-05-20 17:21:55 发布

男 | 程序猿一只，喜欢编程写python代码。

我正试图使用scikit learn中的^{}函数将数据集拆分为一个训练集和一个测试集，但我得到了以下错误：

In [1]: y.iloc[:,0].value_counts()
Out[1]: 
M2    38
M1    35
M4    29
M5    15
M0    15
M3    15

In [2]: xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=1/3, random_state=85, stratify=y)
Out[2]: 
Traceback (most recent call last):
  File "run_ok.py", line 48, in <module>
    xtrain,xtest,ytrain,ytest = train_test_split(X,y,test_size=1/3,random_state=85,stratify=y)
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1700, in train_test_split
    train, test = next(cv.split(X=arrays[0], y=stratify))
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 953, in split
    for train, test in self._iter_indices(X, y, groups):
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1259, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

但是，所有类都至少有15个样本。为什么我会犯这个错误？

X是表示数据点的pandas数据框，y是包含目标变量的一列pandas数据框。

我无法发布原始数据，因为它是专有的，但通过创建一个具有1k行X 500列的随机pandas数据帧（X）和一个具有相同行数（1k）X的随机pandas数据帧（y），以及每行的目标变量（分类标签），它是相当可复制的。 y pandas数据帧应具有不同的分类标签（例如，“class1”、“class2”…），每个标签至少应具有15个引用。

Tags：数据 in py test pyenv pandas home lib

1条回答

网友

1楼 · 发布于 2024-05-20 17:21:55

问题是train_test_split接受2个输入数组，但是y数组是一个单列矩阵。如果我只传递y的第一列，它就可以工作。

train, xtest, ytrain, ytest = train_test_split(X, y.iloc[:,1], test_size=1/3,
  random_state=85, stratify=y.iloc[:,1])

scikit学习错误：y中填充最少的类只有1个memb

相关问题更多 >

编程相关推荐

热门问题

热门文章

scikit学习错误：y中填充最少的类只有1个memb

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >