xgb回归器不断返回100%的准确度

X = multiSdata.filter(['col1', 'col2','col3','col4', 'col5','col6','col7','col8', 'col9','col10','col11','col12','col13','col14','col15','col16','col17', 'col18','col19','col20','col21','col22','col23','col24']) # retain the original feature labels feature_labels = pd.Series(X.columns.values) X.head(5) [![enter image description here][1]][1]

xgb = XGBRegressor(learning_rate =0.1, n_estimators=1000, max_depth=5, min_child_weight=1, gamma=0.1, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=21, eval_metric = ['auc','error'])

import xgboost as xgb params = {'learning_rate' : 0.1, 'n_estimators':1000, 'max_depth':5, 'min_child_weight':1, 'gamma':0.1,'subsampl':0.8,'colsample_bytre':0.8, 'objectiv': 'binary:logistic', 'nthread':4,'scale_pos_weight':1,'seed':21,'eval_metric':['auc','error']} xg_train = xgb.DMatrix(data=X_train, label=y_train); cv_results = xgb.cv(params,xg_train,num_boost_round=10,nfold=5,early_stopping_rounds=10) cv_results

1条回答

网友

1楼 · 发布于 2024-09-27 04:20:11

过采样可能会产生漏洞百出的新案例，有效地复制了测试集案例。将测试集保留下来，就像您所做的一样，可能无法阻止这种情况的发生。仅对组合列车+测试进行重复数据消除。在

如果可行，可以考虑采用欠采样（没有引入新的泄漏，但仍有可能）。如果操作正确，这两种采样方法都不会对精度造成太大影响，因此强烈建议进行重复数据消除。在

至于交叉验证，请确保首先删除重复项，原因是相同的。在

相关问题更多 >

编程相关推荐

热门问题

热门文章