_联合日志可能给我错误的值

2024-10-01 22:40:46 发布

男 | 程序猿一只，喜欢编程写python代码。

我有这样的代码

x_train=data['TOKEN'].loc[:2]
y=data['label'].loc[:2]
x_test=data['TOKEN'].loc[3:]

包含3个数据训练类，每个类1个类（-1）、（0）、（1）和1个数据测试

#TFIDF training
tfidf= TfidfVectorizer(smooth_idf=False,norm=None)
x_tfidf2 = tfidf.fit_transform(x_train)
tfidfframe_train = pd.DataFrame(x_tfidf_train,columns=tfidf.get_feature_names())
#the output of tfidfframe_train 
    a        b       c     d        e       f
0   0.0     0.0      0.0    1.477   1.477   1.0 -> class -1 data train doc1
1   0.0     0.0      1.176  0.0     0.0     1.0  -> class 0 data train doc2
2   1.477   1.477   1.176   0.0     0.0     1.0  -> class 1 data train doc3

#TFIDF testing
x_tfidf3 = tfidf.transform(x_test)
tfidfframe_test = pd.DataFrame(x_tfidf_test,columns=tfidf.get_feature_names())
    a     b    c     d    e    f
0   0.0  0.0  1.17  0.0  0.0  1.0

现在我们知道在我们的数据测试中有c和f两个词我将数据拟合为多项式nb

from sklearn.naive_bayes import MultinomialNB
model =MultinomialNB(alpha=1.0)
classifier = model.fit(x_tfidf_chi2_train,y)
print ('class log prrior \n',classifier.class_log_prior_)
#output (logbase10)
class log prrior #(logbase10 1/3) = -0.47712125 this output is correct
 [-0.47712125 -0.47712125 -0.47712125]

print('Conditional Probabilities :\n',classifier.feature_log_prob_) # count Conditional Prob with P(w|c)
#output #this output actually correct. this count by input the TFIDF values above in data train to logbase10 of P(w|c) calculation
     a            b           c              d         e           f
[[-0.99800822 -0.99800822 -0.99800822 -0.60406095 -0.60406095 -0.69697822] -> class -1 data train doc1
 [-0.91254573 -0.91254573 -0.57486863 -0.91254573 -0.91254573 -0.61151573] -> class 0 data train doc2
 [-0.65256092 -0.65256092 -0.70883108 -1.04650819 -1.04650819 -0.74547819]] -> class 1 data train doc3

现在的问题是，当我试图计算测试数据的类最大对数时，它应该是 sklearn中的P（c）+P（w | c）由_联合(u log)似然所知

所以我们可以通过预测单词[cf]来手动计算

     c            e         logbase10P(c)
-0.99800822 + -0.69697822 + -0.47712125 = -2.17210769 -> class -1 
-0.57486863 + -0.61151573 + -0.47712125 = -1.66350558 -> class 0 
-0.70883108 + -0.74547819 + -0.47712125 =  -1.92552177 -> -> class 1

但是当我试图通过系统输出它时，输出不匹配

jll = classifier._joint_log_likelihood(x_test) 
output sorted left to right (-1,0,1)
     class -1  class 0     class 1
[[-2.34784822 -1.76473496 -2.05624949]]

多项式有什么问题？联合日志的可能性？关于多项式nB的naive_bayes.py证明密码说

 def _joint_log_likelihood(self, X):
        """Calculate the posterior log probability of the samples X"""
        return (safe_sparse_dot(X, self.feature_log_prob_.T) +
                self.class_log_prior_)

也许你可以复习一下，告诉我这是数据 Data 希望你们能回答

Tags： of the 数据 test log output data train

0条回答

目前没有回答

_联合日志可能给我错误的值

相关问题更多 >

编程相关推荐

热门问题

热门文章

_联合日志可能给我错误的值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >