<p>判断预测精度的行业标准方法是接收机工作特性(ROC)。使用下面的代码,可以使用sklearn和matplotlib从数据创建它。在</p>
<p>ROC是真阳性率与假阳性率的二维曲线图。你希望这条线在对角线上方,越高越好。曲线下面积(AUC)是精度的标准度量:越大,分类器越精确。在</p>
<pre><code>import pandas as pd
# read data
df = pd.read_csv('sample_data.csv', header=None, names=['classifier','category'])
# remove values that are not 0 or 1 (two of those)
df = df.loc[(df.category==1.0) | (df.category==0.0),:]
# examine data frame
df.head()
from matplotlib import pyplot as plt
# add this magic if you're in a notebook
# %matplotlib inline
from sklearn.metrics import roc_curve, auc
# matplot figure
figure, ax1 = plt.subplots(figsize=(8,8))
# create ROC itself
fpr,tpr,_ = roc_curve(df.category,df.classifier)
# compute AUC
roc_auc = auc(fpr,tpr)
# plotting bells and whistles
ax1.plot(fpr,tpr, label='%s (area = %0.2f)' % ('Classifier',roc_auc))
ax1.plot([0, 1], [0, 1], 'k ')
ax1.set_xlim([0.0, 1.0])
ax1.set_ylim([0.0, 1.0])
ax1.set_xlabel('False Positive Rate', fontsize=18)
ax1.set_ylabel('True Positive Rate', fontsize=18)
ax1.set_title("Receiver Operating Characteristic", fontsize=18)
plt.tick_params(axis='both', labelsize=18)
ax1.legend(loc="lower right", fontsize=14)
plt.grid(True)
figure.show()
</code></pre>
<p>从你的数据中,你应该得到这样一个图:
<a href="https://i.stack.imgur.com/wbmDu.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/wbmDu.png" alt="enter image description here"/></a></p>