一种多功能朴素贝叶斯分类器
mfnbc的Python项目详细描述
关于数学
这个软件包计算一组简单的算法来计算一个语料库中一组文本上一组特征的最终后验概率。
因此,对于语料库中的每个文本,包将查看中的单词是否包含在提供的似然表中。如果找到它,则使用Bayesian statistics更新每个特征的后验概率。
其中
是:
要求
python=3.3
安装
pip install mfnbc
设置(似然输入文件)
假设您有一个基于单词的似然表(csv文件),其中 标题由文本单词Word和其余的 列是要分类的功能。
例如:
Word | Animal | Human | Plant |
cat | 0.33 | 0.03 | 0.05 |
dog | 0.33 | 0.02 | 0.05 |
leaves | 0.05 | 0.03 | 0.4 |
tree | 0.05 | 0.02 | 0.4 |
man | 0.12 | 0.45 | 0.05 |
women | 0.12 | 0.45 | 0.05 |
设置(未标记的数据文件)
ID | Text |
1 | The cat is my pet and he is lovely. A dog will not do. |
2 | The man and women had a cat and lived under a tree |
3 | The tree had lots of leaves |
4 | A man lives under a tree with many leaves. A women has a cat as a pet |
5 | The dog and cat chase the man under the tree |
6 | The man and women live in a house. |
键的标题是Text任何其他字段都将是 在输出文件中包括未修改的。
导入
frommfnbcimportMFNBC
实例化
m=MFNBC(<likelihoods_input_file-locationofLikelihoodtable(str)>,<unlabeled_data_file-Locationofunlabeleddatafile(str)>,<verboseoutput-Turnonofoffverboseoutput,default:off><outputfilename-defaultstoout.csv,(str)>
示例
m=MFNBC('likeli_sample.csv','input_sample.csv',False,'my_output.csv')m.read_likelihoods()m.calc_posteriors()m.write_csv()
或者您可以在一个命令中完成所有操作
m=MFNBC('likeli_sample.csv','input_sample.csv',False).write_csv()
示例结果
ID | Text | Animal | Human | Plant |
1 | The cat is my pet and he is lovely. A dog will not do. | 0.972321429 | 0.005357143 | 0.022321429 |
2 | The man and women had a cat and lived under a tree | 0.580787094 | 0.2969934 | 0.122219506 |
3 | The tree had lots of leaves | 0.01532802 | 0.003678725 | 0.980993256 |
4 | A man lives under a tree with many leaves. A women has a cat as a pet | 0.334412386 | 0.1026038 | 0.562983814 |
5 | The dog and cat chase the man under the tree | 0.921839729 | 0.00761851 | 0.070541761 |
6 | The man and women live in a house. | 0.065633546 | 0.922971741 | 0.011394713 |