一种用Python包装器实现NLTK BLUU的快速多线程C++实现。

fast-bleu的Python项目详细描述


快速bleu包

这是一个用Python包装器实现NLTK BLUU的快速多线程C++实现,计算固定参考集的BLUU和SelfBLEU得分。 它可以同时高效地返回不同(最大)n个克的(自身)BLEU(例如BLEU-2、BLEU-3等)。在

安装

Linux和WSL

正在安装PyPI latest stable release

pip install --user fast-bleu

MacOS

由于macOS使用clang,并且它不支持OpenMP;一个解决方法是首先使用brew install gcc安装gcc。之后,将添加gcc特定的二进制文件(例如,它可能是gcc-10g++-10)。在

要更改默认编译器,将在安装命令中添加一个选项。因此,您可以使用以下命令安装PyPI latest stable release

^{pr2}$

Windows

还没有测试!在

示例用法

以下是计算BLEU-2、BLEU-3、SelfBLEU-2和SelfBLEU-3的示例:

>>>fromfast_bleuimportBLEU,SelfBLEU>>>ref1=['It','is','a','guide','to','action','that',...'ensures','that','the','military','will','forever',...'heed','Party','commands']>>>ref2=['It','is','the','guiding','principle','which',...'guarantees','the','military','forces','always',...'being','under','the','command','of','the','Party']>>>ref3=['It','is','the','practical','guide','for','the',...'army','always','to','heed','the','directions',...'of','the','party']>>>hyp1=['It','is','a','guide','to','action','which',...'ensures','that','the','military','always',...'obeys','the','commands','of','the','party']>>>hyp2=['he','read','the','book','because','he','was',...'interested','in','world','history']>>>list_of_references=[ref1,ref2,ref3]>>>hypotheses=[hyp1,hyp2]>>>weights={'bigram':(1/2.,1/2.),'trigram':(1/3.,1/3.,1/3.)}>>>bleu=BLEU(list_of_references,weights)>>>bleu.get_score(hypotheses){'bigram':[0.7453559924999299,0.0191380231127159],'trigram':[0.6240726901657495,0.013720869575946234]}

也就是说:

  • hyp1的BLEU-2为0.7453559924999299

  • hyp2的BLEU-2为0.0191380231127159

  • hyp1的BLEU-3为0.6240726901657495

  • hyp2的BLEU-3为0.013720869575946234

>>>self_bleu=SelfBLEU(list_of_references,weights)>>>self_bleu.get_score(){'bigram':[0.25819888974716115,0.3615507630310936,0.37080992435478316],'trigram':[0.07808966062765045,0.20140620205719248,0.21415334758254043]}

也就是说:

  • ref1的SelfBLEU-2为0.25819888974716115

  • ref2的SelfBLEU-2为0.3615507630310936

  • ref3的SelfBLEU-2为0.37080992435478316

  • ref1的SelfBLEU-3为0.07808966062765045

  • ref2的SelfBLEU-3为0.20140620205719248

  • ref3的SelfBLEU-3为0.21415334758254043

Caution在计算期间,引用集的每个标记都转换为字符串格式。在

有关详细信息,请参阅源代码中提供的文档。在

引文

如果对你的研究有帮助,请引用我们的论文。在

@inproceedings{alihosseini-etal-2019-jointly,
    title = {Jointly Measuring Diversity and Quality in Text Generation Models},
    author = {Alihosseini, Danial  and
      Montahaei, Ehsan  and
      Soleymani Baghshah, Mahdieh},
    booktitle = {Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation},
    month = {jun},
    year = {2019},
    address = {Minneapolis, Minnesota},
    publisher = {Association for Computational Linguistics},
    url = {https://www.aclweb.org/anthology/W19-2311},
    doi = {10.18653/v1/W19-2311},
    pages = {90--98},
}

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java将spring j_spring_安全检查迁移到登录   log4j2中自定义appender中AppConfig的java问题   在将java转换为Json时是否可以忽略内部类名和变量   用java将PDF文件转换为十六进制格式   java将值从AsyncTask返回到主类   java如何导入带有部署变量类名的静态函数?   java Spring Boot@ConfigurationProperties未从环境检索属性   java为什么API调用需要80毫秒的延迟才能到达控制器(在Google应用程序引擎中)?   XML配置中MarshallingMessageConverter中的java设置MarshallTo获取无效属性“MarshallTo”   java从群中获取facebook帖子   @ComponentScan的java excludeFilters不起作用   java将单选按钮值从一个类传递到另一个类   java使JTextArea在Swing中可滚动   java Android增强现实应用程序:将球坐标旋转到设备坐标系