NLTK BLUU的一种快速多线程C++实现
FastBLEU的Python项目详细描述
FastBleu套餐
这是NLTK BLUU的快速多线程C++实现;对于固定参考集,计算BLUU和SelfBLEU得分。 它能同时高效地返回不同(最大)n克的(自)bleu(如bleu-2、bleu-3等)。
安装
PYPI最新稳定版本
pip install --user FastBLEU
示例用法
下面是计算bleu-2、bleu-3、selfbleu-2和selfbleu-3的示例:
>>>fromfast_bleuimportBLEU,SelfBLEU>>>ref1=['It','is','a','guide','to','action','that',...'ensures','that','the','military','will','forever',...'heed','Party','commands']>>>ref2=['It','is','the','guiding','principle','which',...'guarantees','the','military','forces','always',...'being','under','the','command','of','the','Party']>>>ref3=['It','is','the','practical','guide','for','the',...'army','always','to','heed','the','directions',...'of','the','party']>>>hyp1=['It','is','a','guide','to','action','which',...'ensures','that','the','military','always',...'obeys','the','commands','of','the','party']>>>hyp2=['he','read','the','book','because','he','was',...'interested','in','world','history']>>>list_of_references=[ref1,ref2,ref3]>>>hypotheses=[hyp1,hyp2]>>>weights={'bigram':(1/2.,1/2.),'trigram':(1/3.,1/3.,1/3.)}>>>bleu=BLEU(list_of_references,weights)>>>bleu.get_score(hypotheses){'bigram':[0.7453559924999299,0.0191380231127159],'trigram':[0.6240726901657495,0.013720869575946234]}
也就是说:
hyp1的bleu-2是0.745355992499929299
hyp2的bleu-2是0.0191380231127159
hyp1的bleu-3是0.6240726901657495
hyp2的bleu-3是0.013720869575946234
>>>self_bleu=SelfBLEU(list_of_references,weights)>>>self_bleu.get_score(){'bigram':[0.25819888974716115,0.3615507630310936,0.37080992435478316],'trigram':[0.07808966062765045,0.20140620205719248,0.21415334758254043]}
也就是说:
参考文献1的selfbleu-2为0.25819888974716115
参考文献2的selfbleu-2为0.3615507630310936
参考文献3的selfbleu-2为0.37080992435478316
参考1的selfbleu-3为0.07808966062765045
参考文献2的selfbleu-3为0.20140620205719248
参考文献3的selfbleu-3为0.21415334758254043
caution在计算期间,引用集的每个标记都转换为字符串格式。
有关详细信息,请参阅源代码中提供的文档。
引文
如果有助于你的研究,请引用我们的论文。
- acl选集:https://www.aclweb.org/anthology/W19-2311
- ARXIVLink:^ {A2}
@article{https://arxiv.org/abs/1904.03971, title={Jointly Measuring Diversity and Quality in Text Generation Models}, author={Montahaei, Ehsan and Alihosseini, Danial and Baghshah, Mahdieh Soleymani}, journal={NAACL HLT 2019}, pages={90}, year={2019}}