我正在尝试使用RandomizedSearchCV的更快版本来搜索最佳参数--->;将sklearn的ArchCV课程减半。我的数据集很大(约100万条记录),它有10个类,而且不平衡
我的代码如下所示:
# Define parameters for model
params_cat = {'loss_function': 'MultiClass',
'eval_metric': 'AUC',
'random_state': seed,
'iterations': 20000,
'early_stopping_rounds' : 5000}
# Define Parameters for Grid
params_grid = {'max_depth': [8, 10],
'l2_leaf_reg': [3, 5], #5, 10, 15
# 'random_strength' : [3], #5
# 'border_count': [128], #254
#'iterations': [20000, 30000] #40000 }
cat = CatBoostClassifier(**params_cat)
mcc_scorer = make_scorer(matthews_corrcoef) cv = StratifiedShuffleSplit(n_splits=2, test_size=0.01, random_state=42) gridsearch = HalvingRandomSearchCV(cat, params_grid, n_jobs=-1, cv=cv, verbose=3,
scoring=mcc_scorer, return_train_score = True)
gridsearch.fit(X_train_val, y_train_val)
运行它时,我得到以下错误:
/home/ec2-user/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars
mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
[CV 1/2] END l2_leaf_reg=5, max_depth=9;, score=(train=1.000, test=0.000) total time= 6.0min
/home/ec2-user/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars
mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
[CV 2/2] END l2_leaf_reg=5, max_depth=9;, score=(train=1.000, test=0.000) total time= 6.0min
/home/ec2-user/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py:925: UserWarning: One or more of the test scores are non-finite: [nan nan nan nan 0. 0.]
category=UserWarning
我不知道错误的原因是HalvingAndomSearchCV、我正在使用的记分器还是参数值。有人知道问题出在哪里吗
谢谢大家!
目前没有回答
相关问题 更多 >
编程相关推荐