UserWarning:运行HalvingAndomSearchCV时,一个或多个测试分数是非限定的:[nan 0.0.]

2024-05-18 18:22:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用RandomizedSearchCV的更快版本来搜索最佳参数--->;将sklearn的ArchCV课程减半。我的数据集很大(约100万条记录),它有10个类,而且不平衡

我的代码如下所示:

# Define parameters for model 
params_cat = {'loss_function': 'MultiClass',  
              'eval_metric': 'AUC',
              'random_state': seed,
              'iterations': 20000,
              'early_stopping_rounds' : 5000}

# Define Parameters for Grid 
params_grid = {'max_depth': [8, 10],
               'l2_leaf_reg': [3, 5], #5, 10, 15
               # 'random_strength' : [3], #5
               # 'border_count': [128], #254
               #'iterations': [20000, 30000] #40000 }

cat = CatBoostClassifier(**params_cat)


mcc_scorer = make_scorer(matthews_corrcoef) cv = StratifiedShuffleSplit(n_splits=2, test_size=0.01, random_state=42) gridsearch = HalvingRandomSearchCV(cat, params_grid, n_jobs=-1, cv=cv, verbose=3,
                                   scoring=mcc_scorer, return_train_score = True)

gridsearch.fit(X_train_val, y_train_val)

运行它时,我得到以下错误:

/home/ec2-user/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
[CV 1/2] END l2_leaf_reg=5, max_depth=9;, score=(train=1.000, test=0.000) total time= 6.0min
/home/ec2-user/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
[CV 2/2] END l2_leaf_reg=5, max_depth=9;, score=(train=1.000, test=0.000) total time= 6.0min
/home/ec2-user/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py:925: UserWarning: One or more of the test scores are non-finite: [nan nan nan nan  0.  0.]
  category=UserWarning

我不知道错误的原因是HalvingAndomSearchCV、我正在使用的记分器还是参数值。有人知道问题出在哪里吗

谢谢大家!


Tags: testtrainrandomparamssklearnnanregcov

热门问题