带管道的SKRFE学习

2024-10-04 07:26:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图将Sklearn的RFECV与管道一起使用,但是对于分类管道和columntransformer中的数字管道中不在列中的值之一,我得到了“无法将字符串转换为浮点”。有人知道解决办法吗?以下是我的管道和RFE代码:

data = {"num1": [1,2,3],"num2":[2,np.nan,2],"cat1":["high","low","high"],"cat2":["left","right","right"],
        "target":[4,5,5]}
data = pd.DataFrame(data=data)

cat_feat = ["cat2"]
num_feat = ["num1"]

X = data[["num1","num2","cat1","cat2"]]
y = data[["target"]]

cat_pipe = Pipeline([
    ('ohe', OneHotEncoder(handle_unknown="ignore"))])

num_pipe = make_pipeline(
    SimpleImputer(missing_values=np.nan, strategy='median'),
    )

columntrans = ColumnTransformer([
    ("cat", cat_pipe, cat_feat),
    ("num", num_pipe, num_feat)
    ],
    remainder="drop",
    n_jobs=-1
)

from sklearn.feature_selection import RFECV, RFE
from sklearn.ensemble import ExtraTreesRegressor

et_pipeline = make_pipeline(columntrans, ExtraTreesRegressor(n_estimators=200, 
                                                              random_state=42, n_jobs=-1))

RFE_model = RFECV(et_pipeline,scoring="neg_mean_squared_error", cv=2, n_jobs=-1)
RFE_model = RFE_model.fit(X, y)
print(RFE_model.n_features_)

因此,值错误是针对列中不在cat_feat或num_feat中的值

编辑:添加错误信息和可复制示例

RFE_model = RFE_model.fit_transform(X_train, y_train)
      File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\base.py", line 693, in fit_transform
        return self.fit(X, y, **fit_params).transform(X)
      File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\feature_selection\_rfe.py", line 508, in fit
        X, y = self._validate_data(
      File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\base.py", line 432, in _validate_data
        X, y = check_X_y(X, y, **check_params)
      File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
        return f(**kwargs)
      File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 796, in check_X_y
        X = check_array(X, accept_sparse=accept_sparse,
      File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
        return f(**kwargs)
      File "C:\Users\Timkr\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 599, in check_array
        array = np.asarray(array, order=order, dtype=dtype)
      File "C:\Users\Timkr\anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
        return array(a, dtype, copy=False, order=order)
    ValueError: could not convert string to float: 'BHV'

Tags: inpydatalibpackageslinesitesklearn