sklearn Boosting：交叉验证，无需每次重新启动就可以找到最佳估计数问题的回答

sklearn Boosting：交叉验证，无需每次重新启动就可以找到最佳估计数

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<p>您可以拟合所有300个估计器，然后使用<code>AdaBoostClassifier.staged_predict()</code>来跟踪错误率如何依赖于估计器的数量。但是，您必须自己执行交叉验证拆分；我不认为它与cross_val_score（）不兼容。在</p> <p>例如</p> <pre><code>from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier # We will use simple stumps for individual estimators in AdaBoost. from sklearn.metrics import accuracy_score import numpy as np import matplotlib.pyplot as plt np.random.seed(0) nSamples = {'train' : 2000, 'test' : 1000} X = np.random.uniform(size = (nSamples['train'] + nSamples['test'], 2)) # Decision boundary is the unit circle. in_class = X[:, 0]**2 + X[:, 1]**2 > 1 y = np.zeros(len(X), dtype = int) y[in_class] = 1 # Add some random error. error_rate = 0.01 to_flip = np.random.choice(np.arange(len(y)), size = int(error_rate * len(y)), replace = False) y[to_flip] = 1 - y[to_flip] # Split training and test. X = {'train' : X[:nSamples['train']], 'test' : X[nSamples['train']:]} y = {'train' : y[:nSamples['train']], 'test' : y[nSamples['train']:]} # Make AdaBoost Classifier. max_estimators = 50 ada_boost = AdaBoostClassifier(DecisionTreeClassifier(max_depth = 1, # Just a stump. random_state = np.random.RandomState(0)), n_estimators = max_estimators, random_state = np.random.RandomState(0)) # Fit all estimators. ada_boost.fit(X['train'], y['train']) # Get the test accuracy for each stage of prediction. scores = {'train' : [], 'test' : []} for y_predict_train, y_predict_test in zip(ada_boost.staged_predict(X['train']), ada_boost.staged_predict(X['test'])): scores['train'].append(accuracy_score(y['train'], y_predict_train)) scores['test'].append(accuracy_score(y['test'], y_predict_test)) # Plot the results. n_estimators = range(1, len(scores['train']) + 1) for key in scores.keys(): plt.plot(n_estimators, scores[key]) plt.title('Staged Scores') plt.ylabel('Accuracy') plt.xlabel('N Estimators') plt.legend(scores.keys()) plt.show() </code></pre>

sklearn Boosting：交叉验证，无需每次重新启动就可以找到最佳估计数

1 个回答

相关Python问题