如何计算Cohen的kappa系数来衡量评分者之间的一致性？（影评）

import pandas as pd from sklearn.naive_bayes import MultinomialNB from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.model_selection import StratifiedShuffleSplit xlsx1 = pd.ExcelFile('App-Music/reviews.xlsx') ''' review are stored in two columns, one for the review, one for the rating ''' X = pd.read_excel(xlsx1,'Sheet1').Review Y = pd.read_excel(xlsx1,'Sheet1').Rating X_train, X_test, Y_train, Y_test = train_test_split(X_documents, Y, stratify=Y) new_vect= TfidfVectorizer(ngram_range=(1, 2), stop_words='english') X_train_dtm = new_vect.fit_transform(X_train.values.astype('U')) X_test_dtm = new_vect.fit_transform(X_test.values.astype('U')) new_model.fit(X_train_dtm,Y_train) new_model.score(X_test_dtm,Y_test) ''' this is the part where I want to calculate cohen kappa score for comparison '''

1条回答

网友

1楼 · 发布于 2024-09-28 22:13:24

如documentation of ^{}所述：

The kappa statistic is symmetric, so swapping y1 and y2 doesn’t change the value.

此度量中没有y_pred，y_true。你在信中提到的签名是

sklearn.metrics.cohen_kappa_score(y1, y2, labels=None, weights=None)

在这种情况下，没有什么比正确的预测值更合适的了。只是两个不同的人贴的标签。因此，由于他们对这个话题的看法和理解，可能会有差异。

您只需要提供两个列表（或数组），其中的标签由不同的注释器注释。命令无关紧要。

编辑1

你说你有文字评论。在这种情况下，需要应用一些特征提取过程来标识标签。

This metric用于找到两个人之间标记数据的一致性。就像给一些数据样本分配一个类。这不能直接用于原始文本。

编辑2：假设y只包含整数（可能从1到10进行审查），这将成为一个多类分类问题。它由cohen_kappa_score的scikit实现支持。

如果我正确理解你发布的情感分析链接，那么你应该：

Y_pred = new_model.predict(X_test_dtm)
cohen_score = cohen_kappa_score(Y_test, Y_pred)

相关问题更多 >

编程相关推荐

热门问题

热门文章