我试图将Sklearn的RFECV与管道一起使用,但是对于分类管道和columntransformer中的数字管道中不在列中的值之一,我得到了“无法将字符串转换为浮点”。有人知道解决办法吗?以下是我的管道和RFE代码:
data = {"num1": [1,2,3],"num2":[2,np.nan,2],"cat1":["high","low","high"],"cat2":["left","right","right"],
"target":[4,5,5]}
data = pd.DataFrame(data=data)
cat_feat = ["cat2"]
num_feat = ["num1"]
X = data[["num1","num2","cat1","cat2"]]
y = data[["target"]]
cat_pipe = Pipeline([
('ohe', OneHotEncoder(handle_unknown="ignore"))])
num_pipe = make_pipeline(
SimpleImputer(missing_values=np.nan, strategy='median'),
)
columntrans = ColumnTransformer([
("cat", cat_pipe, cat_feat),
("num", num_pipe, num_feat)
],
remainder="drop",
n_jobs=-1
)
from sklearn.feature_selection import RFECV, RFE
from sklearn.ensemble import ExtraTreesRegressor
et_pipeline = make_pipeline(columntrans, ExtraTreesRegressor(n_estimators=200,
random_state=42, n_jobs=-1))
RFE_model = RFECV(et_pipeline,scoring="neg_mean_squared_error", cv=2, n_jobs=-1)
RFE_model = RFE_model.fit(X, y)
print(RFE_model.n_features_)
因此,值错误是针对列中不在cat_feat或num_feat中的值
编辑:添加错误信息和可复制示例
RFE_model = RFE_model.fit_transform(X_train, y_train)
File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\base.py", line 693, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\feature_selection\_rfe.py", line 508, in fit
X, y = self._validate_data(
File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\base.py", line 432, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
return f(**kwargs)
File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 796, in check_X_y
X = check_array(X, accept_sparse=accept_sparse,
File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
return f(**kwargs)
File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 599, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "C:\Users\Timkr\anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'BHV'
目前没有回答
相关问题 更多 >
编程相关推荐