我正在培训一个模型,以根据贷款状态(例如0,1,2,3)预测标签(目标)。所以我有4节课。到目前为止,我已经培训了一个模型,如下所示:
from HyperclassifierSearch import HyperclassifierSearch
X = data.iloc[:, :-1]
y = data.label
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2,
random_state=42)
# Create a hold out dataset to train the calibrated model to prevent overfitting
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train,
stratify=y_train, test_size=0.2, random_state=42)
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
numeric_transformer = Pipeline(steps=[('imputer',SimpleImputer(missing_values=np.nan, fill_value=0) ),('scaler', StandardScaler())])
preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_cols),
('cat', categorical_transformer, cat_cols)])
#then i use hyperclassifer library
models = { 'xgb': Pipeline(steps=[('preprocessor', preprocessor),('clf', XGBClassifier(objective='multi:softprob'))]),
'rf': Pipeline(steps=[('preprocessor', preprocessor),('clf', RandomForestClassifier(criterion = 'entropy', random_state = 42))]) }
search = HyperclassifierSearch(models, params)
best_grid = search.train_model(X_train, y_train, cv=3, n_jobs=-1, scoring='accuracy')
results = search.evaluate_model()
fitted_model = best_grid.best_estimator_
pred = fitted_model.predict_proba(X_test)
labels = fitted_model.predict(X_test)
**注意,我省略了大量导入的libs和params dict,因为它很大,所以只包含超分类功能**
我的pred是一个矩阵,包含4列,每列都与贷款类别相关。一般来说,我知道校准概率是很好的做法,特别是从基于树的算法中,输出是一个分数,而不是一个概率。然而,我对如何校准这些概率感到困惑
我是否应该通过执行以下操作来扩展上述xgbclassifier:
OneVsRestClassifier(CalibratedClassifierCV(XGBClassifier(objective='multi:softprob'), cv=10))
资料来源:Multiclass linear SVM in python that return probability
目前没有回答
相关问题 更多 >
编程相关推荐