自定义估计器无法通过交叉值分数进行复制

2024-09-30 18:15:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个我自己实现的自定义估计器,我不能使用cross_val_score(),我相信这与我的predict()方法有关。以下是完整的错误跟踪:

    Traceback (most recent call last):
  File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/test.py", line 30, in <module>
    ada2_score = cross_val_score(ada_2, X, y, cv=5)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 390, in cross_val_score
    error_score=error_score)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 236, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1004, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 754, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 209, in apply_async
    result = ImmediateResult(func)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 590, in __init__
    self.results = batch()
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 544, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 591, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 89, in __call__
    score = scorer(estimator, *args, **kwargs)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 371, in _passthrough_scorer
    return estimator.score(*args, **kwargs)
  File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 92, in score
    scr_pred = self.predict(X)
  File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 73, in predict
    clf_pred = clf.predict(X)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn_extensions/extreme_learning_machines/elm.py", line 614, in predict
    class_predictions = self.binarizer.inverse_transform(raw_predictions)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 528, in inverse_transform
    self.classes_, threshold)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 750, in _inverse_binarize_thresholding
    format(y.shape))
ValueError: output_type='binary', but y.shape = (30, 3)

我的predict(self, X)方法返回一个大小为n_samples的向量,其中包含对X参数的预测。我还创建了一个score()函数,如下所示:

def score(self, X, y):
    scr_pred = self.predict(X)
    return sum(scr_pred == y) / X.shape[0]

该方法仅计算给定样本的模型的精度。如果我使用这个score()方法,或者设置一个cross_val_score(... , scoring="accuracy"),那么它就不起作用了

注意:我知道this question/answer,但这不适用于我的情况,因为我可以确认我的构造函数的一致性:

def __init__(self, estimators=["MLP"], n_rounds=5, random_state=10):
    self.estimators = estimators
    self.n_rounds = n_rounds
    self.random_state = random_state

更新

进一步的研究使我找到了this topic,在那里解释了sklearn不能用变压器深度复制估计器。但是,我的估计器必须运行LabelBinarizer来转换数据以获得预测。因此,我将问题标题更新为正确的问题


Tags: inpyselflibpackageslinesitesklearn
1条回答
网友
1楼 · 发布于 2024-09-30 18:15:22

然而,您的问题陈述在这里并不清楚,但是从错误的角度来看,您似乎在尝试多类分类

这里的问题是,您的代码中可能在某些时候没有正确执行预处理,因为错误是从反向二值化阈值记录的,这是由于sklearn pre-prosessing的以下功能引起的:

def _inverse_binarize_thresholding(y, output_type, classes, threshold):
   
    if output_type == "binary" and y.ndim == 2 and y.shape[1] > 2:
        raise ValueError("output_type='binary', but y.shape = {0}".
                         format(y.shape))

您的代码中一定缺少一些转换或预处理,您必须正确使用LabelBinarizer

阅读下面的文档并回溯错误以修复代码

documentation

相关问题 更多 >