按R中的“irr”包计算的加权Kappa是否错误？

library(irr) label <- read.csv('label_test.csv',header=FALSE) pred <- read.csv('pred_test.csv',header=FALSE) kapp <- kappa2(data.frame(label,pred),"unweighted") kappa <- getElement(kapp,"value") print(kappa) # output: 0.245283 w_kapp <- kappa2(data.frame(label,pred),"equal") weighted_kappa <- getElement(w_kapp,"value") print(weighted_kappa) # output: 0.443038

from sklearn.metrics import cohen_kappa_score label = pd.read_csv(label_file, header=None).to_numpy() pred = pd.read_csv(pred_file, header=None).to_numpy() kappa = cohen_kappa_score(label.astype(int), pred.astype(int)) print(kappa) # output: 0.24528301886792447 weighted_kappa = cohen_kappa_score(label.astype(int), pred.astype(int), weights='linear', labels=np.array(list(range(100))) ) print(weighted_kappa) # output: 0.8359908883826879

2条回答

网友

1楼 · 编辑于 2024-09-29 02:27:22

我已经给该软件包的作者发了电子邮件，他说他将在下一次更新中修复该漏洞

详情如下:

Actually, I am aware of this awkward behavior of the kappa2-function. This is due to the conversion and reordering of factor levels. These are actually not two bugs but only one that results in an incorrect generation of the confusion matrix (which you already found out). You can easily fix it by deleting the first row in the kappa2-function ("ratings <- as.matrix(na.omit(ratings))"). This conversion to numerical value as part of the removal of NA ratings is responsible for the error.
In general, my function needs to know the factor levels in order to correctly compute kappa. Thus, for your data, you would need to store the values as factors with the appropriate possible factor levels. E.g.
label <- c(0, 1, 1, 1, 0, 14, 53, 3) label <- factor(label, levels=0:100) pred <- c(0, 1, 1, 0, 3, 4, 54, 6) pred <- factor(pred, levels=0:100)
ratings <- data.frame(label,pred)
When you now run the modified kappa2-function (i.e. without the first line), the results should be correct.
kappa2(ratings) # unweighted kappa2(ratings, "equal") # weighted kappa with equal weights
For the next update of my package, I will take this into account.

网友

2楼 · 编辑于 2024-09-29 02:27:22

作者的解决方案行不通，因为在kappa2函数的code中，它将您的评级转换为矩阵，一旦您将因子转换为矩阵，级别将丢失，这是一行：

ratings <- as.matrix(na.omit(ratings))

您可以在数据上尝试，它将转换为字符：

lvl = 0:100
ratings = data.frame(label = factor(label[,1],levels=lvl),
                     pred = factor(pred[,1],levels=lvl))

 as.matrix(ratings)
     label pred
[1,] "0"   "0" 
[2,] "1"   "1" 
[3,] "1"   "1" 
[4,] "1"   "0" 
[5,] "0"   "3" 
[6,] "14"  "4" 
[7,] "53"  "54"
[8,] "3"   "6"

同样的结果：

kappa2(ratings,weight="equal")
 Cohen's Kappa for 2 Raters (Weights: equal)

 Subjects = 8 
   Raters = 2 
    Kappa = 0.368 

        z = 1.79 
  p-value = 0.0742

我建议使用DescTools，您只需要使用R中的table()函数提供混淆矩阵，并正确声明上述因子：

library(DescTools)

CohenKappa(table(ratings$label,ratings$pred), weight="Unweighted")
[1] 0.245283

CohenKappa(table(ratings$label,ratings$pred), weight="Equal-Spacing")
[1] 0.8359909

相关问题更多 >

编程相关推荐

热门问题

热门文章