此摘要生成器试图利用字节对编码(BPE)标记化和Bart词汇表来根据语义意义过滤文本。

bpe-summarizer的Python项目详细描述


BPE摘要生成器

CI

此摘要生成器尝试利用字节对编码(BPE)标记化和Bart词汇表按语义意义过滤文本。在

BPE文本表示是一种子词级的标记化方法,其目的是在保持语义值的同时有效地重用部分单词。在

该算法基于n元对的频率。更频繁的对用更大的令牌表示。在

本项目探讨了一个假设,即标记大小与语义意义密切相关。这种摘要方法旨在通过比较标记值和保留原始文本中包含特定百分位内有意义标记的句子来显示最有意义的句子。在

安装

pip install bpe-summarizer

使用

^{pr2}$

参数

ParameterDefinitionDefaultType
^{}A text blob with sentences delineated by punctuation^{}^{}
^{}Sentences that include tokens in the top kth percentile will remain after summarization^{}^{}
^{}A huggingface ^{} instance that relies on byte-pair-encoding^{}^{}
^{}If ^{}, summarization will be applied at both the document level and the sentence level^{}^{}
^{}When ^{} is ^{}, this percentile will be applied to individual sentences^{}*^{}
  • 注意:intra_sentence_percentile如果其值小于令牌平均值的百分位分数,则忽略该值,否则使用平均值的百分位分数。

示例

人类摘要

Building Deep Dependency Structures Using A Wide-Coverage CCG Parser

This paper describes a wide-coverage statistical parser that uses Combinatory Categorial Grammar (CCG) to derive dependency structures.

The parser differs from most existing wide-coverage treebank parsers in capturing the long-range dependencies inherent in constructions such as coordination, extraction, raising and control, as well as the standard local predicate-argument dependencies.

A set of dependency structures used for training and testing the parser is obtained from a treebank of CCG normal-form derivations, which have been derived (semi-) automatically from the Penn Treebank.\nThe parser correctly recovers over 80% of labelled dependencies, and around 90% of unlabelled dependencies.

We provide examples showing how heads can fill dependency slots during a derivation, and how long-range dependencies can be recovered through unification of co-indexed head variables.

We define predicate argument structure for CCG in terms of the dependencies that hold between words with lexical functor categories and their arguments.\n

BPE摘要

Building Deep Dependency Structures Using A Wide-Coverage CCG Parser

This paper describes a wide-coverage statistical parser that uses Combinatory Categorial Grammar (CCG) to derive dependency structures.

The parser differs from most existing wide-coverage treebank parsers in capturing the long-range dependencies inherent in constructions such as coordination, extraction, raising and control, as well as the standard local predicate-argument dependencies.

A set of dependency structures used for training and testing the parser is obtained from a treebank of CCG normal-form derivations, which have been derived (semi-) automatically from the Penn Treebank. The parser correctly recovers over 80% of labelled dependencies, and around 90% of unlabelled dependencies. However, the dependencies are typically derived from a context-free phrase structure.

评价

为了评估摘要的质量,我们使用semantic similarity metric,将自动摘要的示例与来自scisummnet dataset的人工摘要进行比较。文本用sentence-level embeddings表示。图1。将BPE摘要生成器的结果与widely used摘要技术进行比较。它在100个样本中进行了竞争,在百分之一秒内完成了总结,而在55秒的时间内完成了总结。在

Side-by-side with widely used summarizer

<;small>;图1。使用广泛使用的摘要生成器进行评估<;/small>

<;small>;*性能评估是使用CPU完成的,而竞争性技术则是在拆分为仅使用summarization component之后应用的。<;/small>

引用:

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
mysql java。安全spec.InvalidKeySpecException:java。安全InvalidKeyException:IOException:algid分析错误,不是序列   java如何从Jython执行交互式shell?   java使布尔值在特定时间内为true时执行其他操作   在java属性声明中强制执行泛型类型   java如何修复此代码以满足要求?   java如何在ctrl+q上退出并对其他热键做出反应   java为什么JDBC是动态加载的,而不是导入的?   java如何从Eclipse运行Scuba项目   我的Java JOptionPane没有继续到下一个窗口?   javascript Cordova安卓在简历中保持后台状态?   在windows命令提示符下运行java文件   java编写自定义上下文选择器log4j2   将Java转换为javascript代码