如何组合多个朴素贝叶斯分类器的输出？

from sklearn import datasets from sklearn.naive_bayes import GaussianNB import numpy as np import cPickle import math iris = datasets.load_iris() gnb1 = GaussianNB() gnb2 = GaussianNB() gnb3 = GaussianNB() gnb4 = GaussianNB() #Actual dataset is of 3 class I just made it into 2 class for this demo target = np.where(iris.target, 2, 1) gnb1.fit(iris.data[:, 0].reshape(150,1), target) gnb2.fit(iris.data[:, 1].reshape(150,1), target) gnb3.fit(iris.data[:, 2].reshape(150,1), target) gnb4.fit(iris.data[:, 3].reshape(150,1), target) #y_pred = gnb.predict(iris.data) index = 0 y_prob1 = gnb1.predict_proba(iris.data[index,0].reshape(1,1)) y_prob2 = gnb2.predict_proba(iris.data[index,1].reshape(1,1)) y_prob3 = gnb3.predict_proba(iris.data[index,2].reshape(1,1)) y_prob4 = gnb4.predict_proba(iris.data[index,3].reshape(1,1)) #print y_prob1, "\n", y_prob2, "\n", y_prob3, "\n", y_prob4 # I just added it over all for each class pos = y_prob1[:,1] + y_prob2[:,1] + y_prob3[:,1] + y_prob4[:,1] neg = y_prob1[:,0] + y_prob2[:,0] + y_prob3[:,0] + y_prob4[:,0] print pos print neg

1条回答

网友

1楼 · 发布于 2024-10-05 10:05:57

首先-你为什么这么做？这里应该有一个朴素的Bayes，而不是每个特性都有一个。你好像不明白分类器的意思。你所做的实际上是naivebayes在内部所做的事情——它独立地处理每个特征，但是由于这些是概率，你应该将它们相乘或者加上对数，因此：

你应该只有一个号码，gnb.fit(iris.data, target)
如果你坚持要有多个NBs，你应该通过乘法或对数加法来合并它们（从数学角度来看，这是一样的，但是乘法在数值意义上不太稳定）
pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1]
或者
pos = np.exp(np.log(y_prob1[:,1]) + np.log(y_prob2[:,1]) + np.log(y_prob3[:,1]) + np.log(y_prob4[:,1]))
也可以直接通过gnb.predict_log_proba而不是gbn.predict_proba来预测对数。在
但是，这种方法有一个错误-naivebayes也会在每个prob中包含previor，因此您将得到非常倾斜的分布。所以你必须手动规范化
pos_prior = gnb1.class_prior_[1]所有模型的优先级都相同，因此我们可以使用gnb1中的一个
pos = pos_prior_ * (y_prob1[:,1]/pos_prior_) * (y_prob2[:,1]/pos_prior_) * (y_prob3[:,1]/pos_prior_) * (y_prob4[:,1]/pos_prior_)
简化为
pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1] / pos_prior_**3
以及记录到
pos = ... - 3 * np.log(pos_prior_)
所以再一次-你应该使用“1”选项。

相关问题更多 >

编程相关推荐

热门问题

热门文章