灵敏度-特异性曲线

2条回答

网友

1楼 · 编辑于 2024-09-28 22:47:47

在@ApproachingDarknessFish's answer的基础上，您可以为得到的直方图拟合各种分布，而不是所有分布都在[0,1]之外。例如，beta分布可以很好地捕捉[0,1]上的大多数单峰分布，至少为了可视化起见：

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

test_y = np.array([0]*100 + [1]*100)
predicted_y_probs = np.concatenate((np.random.beta(2,5,100), np.random.beta(8,3,100)))

def estimate_beta(X):
    xbar = np.mean(X)
    vbar = np.var(X,ddof=1)
    alphahat = xbar*(xbar*(1-xbar)/vbar - 1)
    betahat = (1-xbar)*(xbar*(1-xbar)/vbar - 1)
    return alphahat, betahat

positive_beta_estimates = estimate_beta(predicted_y_probs[test_y == 1])
negative_beta_estimates = estimate_beta(predicted_y_probs[test_y == 0])

unit_interval = np.linspace(0,1,100)
plt.plot(unit_interval, scipy.stats.beta.pdf(unit_interval, *positive_beta_estimates), c='r', label="positive")
plt.plot(unit_interval, scipy.stats.beta.pdf(unit_interval, *negative_beta_estimates), c='g', label="negative")

# Show the threshold.
plt.axvline(0.5, c='black', ls='dashed')
plt.xlim(0,1)

# Add labels
plt.legend()

网友

2楼 · 编辑于 2024-09-28 22:47:47

我不认为这个情节显示了你所认为的。当阈值降至零时，灵敏度将接近1，因为100%的观察结果将被归为阳性，而假阴性率将降至零。同样，当阈值接近1时，选择性将接近1，因为每次观察都将被归类为阴性，假阳性率将为零。所以这张图没有显示出敏感性或选择性。在

为了在x轴上绘制选择性和灵敏度作为阈值的函数，我们可以使用内置的ROC功能并从中提取值，以我们自己的方式绘制它们。给定一个二进制标签的向量test_y，一个关联的预测器矩阵test_x，以及一个匹配的RandomForestClassifier对象{}：

import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import precision_score, recall_score

# Get the estimated probabilities of each observation being categorized as positive
# [:,1] for probabilities of negative
predicted_y_probs = rfc.predict_proba(test_x)[:,0]

thresholds = np.linspace(0,1,20) # or however many points you want

sensitivities = [recall_score(test_y, predicted_y_probs >= t) for t in thresholds]
selectivities = [precision_score(test_y, predicted_y_probs >= t) for t in thresholds]
plt.plot(thresholds, sensitivies, label='sensitivity')
plt.plot(thresholds, selectivities, label='selectivity')
plt.legend()

然而，这将重建您提供的作为参考的图，它似乎显示了每个观察被归为阳性的估计概率的分布。换句话说，这个图中的阈值是一个常数，x轴显示了每个预测相对于这个（平稳）阈值下降的位置。它不能直接告诉我们敏感性或选择性。如果你真的想要这样的情节，继续读下去。在

我想不出重建这些平滑曲线的方法，因为密度图会延伸到0以下和1以上，但我们可以用直方图显示信息。使用与之前相同的变量：

^{pr2}$

我为经典的Iris数据集运行了这段代码，只使用了三个物种中的两个物种，得到了以下输出。花色为“阳性”，绿僵菌为“阴性”，而刚毛被忽略以产生二元分类。请注意，我的模型具有完美的召回能力，因此versicolor的所有概率都非常接近1.0。由于只有100个样本，其中大多数都被正确分类，所以这相当困难，但希望它能让人理解这个想法。在

相关问题更多 >

编程相关推荐

热门问题

热门文章