如何从logisticseregressioncv和GridSearchCV中获得可比和可重复的结果

from sklearn import datasets boston = datasets.load_boston() X = boston.data y = boston.target y[y <= y.mean()] = 0; y[y > 0] = 1 import numpy as np from sklearn.cross_validation import KFold from sklearn.linear_model import LogisticRegression from sklearn.grid_search import GridSearchCV from sklearn.linear_model import LogisticRegressionCV fold = KFold(len(y), n_folds=5, shuffle=True, random_state=777)

物流回归

searchCV = LogisticRegressionCV( Cs=list(np.power(10.0, np.arange(-10, 10))) ,penalty='l2' ,scoring='roc_auc' ,cv=fold ,random_state=777 ,max_iter=10000 ,fit_intercept=True ,solver='newton-cg' ,tol=10 ) searchCV.fit(X, y) print ('Max auc_roc:', searchCV.scores_[1].max())

Max auc_roc: 0.970588235294

解算器newton-cg仅用于提供固定值，其他人也尝试过。我忘了什么？在

另外，在这两种情况下，我还收到警告“/usr/lib64/python3.4/site-packages/sklearn/utils”/优化。py:193：UserWarning:行搜索失败警告。警告（'Line Search failed'）“我也不明白。如果有人能描述一下它的意思，我会很高兴，但我希望它与我的主要问题无关。在

编辑更新

通过@joeln注释添加max_iter=10000和tol=10个参数。它不改变任何数字的结果，但警告消失了。在

1条回答

网友

1楼 · 发布于 2024-06-28 19:42:02

以下是scikit learn issue tracker上的answer by Tom副本：

LogisticRegressionCV.scores_给出所有折叠的分数。 GridSearchCV.best_score_给出了所有折叠的最佳平均分数。在

要获得相同的结果，您需要更改代码：

print('Max auc_roc:', searchCV.scores_[1].max())  # is wrong
print('Max auc_roc:', searchCV.scores_[1].mean(axis=0).max())  # is correct

通过使用默认的tol=1e-4而不是你的tol=10，我得到：

^{pr2}$

剩下的（小的）差别可能来自于LogisticRegressionCV的热启动（这实际上是它比GridSearchCV更快的原因）。在

网格搜索CV

物流回归

编辑更新

相关问题更多 >

编程相关推荐

热门问题

热门文章