scikit学习网格交叉验证返回错误的平均值

gscv = GridSearchCV(n_jobs=n_jobs,cv=train_test_iterable, estimator=pipeline, param_grid=param_grid, verbose=10, scoring=['accuracy', 'precision','recall','f1'], refit='f1', return_train_score=return_train_score, error_score=error_score, ) gscv.fit(X,Y) gscv.cv_results_

mean_test_f1 split0_test_f1 split1_test_f1 Actual Mean 0.934310796 0.935603198 0.933665455 0.934634326 0.931279716 0.908430118 0.942689316 0.925559717 0.927683609 0.912005672 0.935512149 0.923758911 0.680908006 0.741198823 0.650802701 0.696000762 0.680908006 0.741198823 0.650802701 0.696000762 0.646005028 0.684483208 0.626791532 0.65563737 0.840273248 0.847484083 0.836672627 0.842078355 0.837160828 0.847484083 0.832006068 0.839745075 0.833637 0.842109375 0.829406448 0.835757911

2条回答

网友

1楼 · 编辑于 2024-07-01 08:18:37

尝试在GridSearchCV(...)中设置iid=False，然后进行比较。你知道吗

根据文件：

iid : boolean, default=True

    If True, the data is assumed to be identically distributed across 
    the folds, and the loss minimized is the total loss per sample,
    and not the mean loss across the folds.

因此，当iid为真（默认情况下），测试分数的平均值包括指定的权重here in source code：

    _store('test_%s' % scorer_name, test_scores[scorer_name],
                   splits=True, rank=True,
                   weights=test_sample_counts if iid else None)

请注意，培训分数不受其影响，因此还要交叉检查培训分数的平均值。你知道吗

网友

2楼 · 编辑于 2024-07-01 08:18:37

我认为你看到的是一个加权平均数，而不是一个直接平均数。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章