scikit学习交叉价值预测准确度分数是如何计算的？问题的回答

scikit学习交叉价值预测准确度分数是如何计算的？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

不，不是！ 根据<a href="http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation" rel="noreferrer">cross validation doc</a>页面，<code>cross_val_predict</code>不返回任何分数，只返回基于特定策略的标签，如下所述： <blockquote> The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised). </blockquote> 因此，通过调用<code>accuracy_score(labels, ypred)</code>您只需计算由上述特定策略预测的标签相对于真实标签的准确度分数。这在同一文档页中再次指定： <blockquote> These prediction can then be used to evaluate the classifier: <pre><code>predicted = cross_val_predict(clf, iris.data, iris.target, cv=10) metrics.accuracy_score(iris.target, predicted) </code></pre> Note that the result of this computation may be slightly different from those obtained using cross_val_score as the elements are grouped in different ways. </blockquote> 如果你需要不同褶皱的准确度分数，你应该尝试： <pre><code>>>> scores = cross_val_score(clf, X, y, cv=cv) >>> scores array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ]) </code></pre> 对于所有褶皱的平均精度，使用<code>scores.mean()</code>： <pre><code>>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)) Accuracy: 0.98 (+/- 0.03) </code></pre> <hr/> <h2>如何计算每个折叠的Cohen-kappa系数和混淆矩阵？</h2> 为了计算<code>Cohen Kappa coefficient</code>和混淆矩阵，我假设您是指真标签和每个折叠的预测标签之间的kappa系数和混淆矩阵： <pre><code>from sklearn.model_selection import KFold from sklearn.svm.classes import SVC from sklearn.metrics.classification import cohen_kappa_score from sklearn.metrics import confusion_matrix cv = KFold(len(labels), n_folds=20) clf = SVC() for train_index, test_index in cv.split(X): clf.fit(X[train_index], labels[train_index]) ypred = clf.predict(X[test_index]) kappa_score = cohen_kappa_score(labels[test_index], ypred) confusion_matrix = confusion_matrix(labels[test_index], ypred) </code></pre> <hr/> <h2><code>cross_val_predict</code>返回什么？</h2> 它使用KFold将数据分割成<code>k</code>部分，然后进行<code>i=1..k</code>迭代： <ul> <li>以<code>i'th</code>部分作为测试数据，其他部分作为训练数据</li> <li>用训练数据训练模型（除了<code>i'th</code>之外的所有部分）</li> <li>然后使用这个训练模型，预测<code>i'th</code>部分（测试数据）的标签</li> </ul> 在每次迭代中，都会预测数据的<code>i'th</code>部分的标签。最后，cross-val_predict合并所有部分预测的标签，并将其作为最终结果返回。 此代码将逐步显示此过程： <pre><code>X = np.array([[0], [1], [2], [3], [4], [5]]) labels = np.array(['a', 'a', 'a', 'b', 'b', 'b']) cv = KFold(len(labels), n_folds=3) clf = SVC() ypred_all = np.chararray((labels.shape)) i = 1 for train_index, test_index in cv.split(X): print("iteration", i, ":") print("train indices:", train_index) print("train data:", X[train_index]) print("test indices:", test_index) print("test data:", X[test_index]) clf.fit(X[train_index], labels[train_index]) ypred = clf.predict(X[test_index]) print("predicted labels for data of indices", test_index, "are:", ypred) ypred_all[test_index] = ypred print("merged predicted labels:", ypred_all) i = i+1 print("=====================================") y_cross_val_predict = cross_val_predict(clf, X, labels, cv=cv) print("predicted labels by cross_val_predict:", y_cross_val_predict) </code></pre> 结果是： <pre><code>iteration 1 : train indices: [2 3 4 5] train data: [[2] [3] [4] [5]] test indices: [0 1] test data: [[0] [1]] predicted labels for data of indices [0 1] are: ['b' 'b'] merged predicted labels: ['b' 'b' '' '' '' ''] ===================================== iteration 2 : train indices: [0 1 4 5] train data: [[0] [1] [4] [5]] test indices: [2 3] test data: [[2] [3]] predicted labels for data of indices [2 3] are: ['a' 'b'] merged predicted labels: ['b' 'b' 'a' 'b' '' ''] ===================================== iteration 3 : train indices: [0 1 2 3] train data: [[0] [1] [2] [3]] test indices: [4 5] test data: [[4] [5]] predicted labels for data of indices [4 5] are: ['a' 'a'] merged predicted labels: ['b' 'b' 'a' 'b' 'a' 'a'] ===================================== predicted labels by cross_val_predict: ['b' 'b' 'a' 'b' 'a' 'a'] </code></pre>

scikit学习交叉价值预测准确度分数是如何计算的？

1 个回答

相关Python问题