所有自变量均为分类变量，因（目标）变量为连续变量

from sklearn.model_selection import ShuffleSplit from sklearn.model_selection import cross_val_score kfolds=ShuffleSplit(n_splits=5,test_size=0.2,random_state=0) cross_val_score(pipe,X,y,cv=kfolds).mean()

1条回答

网友

1楼 · 发布于 2024-09-28 23:37:04

看起来您提供的类别（在xinp，即Sec 10值中）在训练数据中不存在，因此它不能是一个热编码的类别，因为它没有虚拟变量（没有对应的二进制列）。可能的解决方案之一是：

ohc=OneHotEncoder(categories = "auto", handle_unknown = "ignore")

来自scikit的一个热编码器documentation：

handle_unknown{‘error’, ‘ignore’}, default=’error’
Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None.

相关问题更多 >

编程相关推荐

热门问题

热门文章

所有自变量均为分类变量，因（目标）变量为连续变量

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >