Text2Text:为文本生成问题和摘要

text2text的Python项目详细描述


Text2Text:为文本生成问题和摘要

输入你的文本,并得到问题和总结作为回报!在

引文

引用这篇文章,请使用下面的BibTeX引文。在

@misc{text2text@2020,
  author={Wangperawong, Artit},
  title={Text2Text: generate questions and summaries for your texts},
  year={2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/artitw/text2text}},
  url = {https://github.com/artitw/text2text}
}

要求

安装

Pythorch延伸(顶点)

^{pr2}$

文本2文本

pip install text2text

示例

Colab演示

Open In Colab

演示视频

获取一些文本

notre_dame_str = "As at most other universities, Notre Dame's students run a number of news media outlets. The nine student - run outlets include three newspapers, both a radio and television station, and several magazines and journals. Begun as a one - page journal in September 1876, the Scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the United States. The other magazine, The Juggler, is released twice a year and focuses on student literature and artwork. The Dome yearbook is published annually. The newspapers have varying publication interests, with The Observer published daily and mainly reporting university and other news, and staffed by students from both Notre Dame and Saint Mary's College. Unlike Scholastic and The Dome, The Observer is an independent publication and does not have a faculty advisor or any editorial oversight from the University. In 1987, when some students believed that The Observer began to show a conservative bias, a liberal newspaper, Common Sense was published. Likewise, in 2003, when other students believed that the paper showed a liberal bias, the conservative paper Irish Rover went into production. Neither paper is published as often as The Observer; however, all three are distributed to all students. Finally, in Spring 2008 an undergraduate journal for political science research, Beyond Politics, made its debut."

bacteria_str = "Bacteria are a type of biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a number of shapes, ranging from spheres to rods and spirals. Bacteria were among the first life forms to appear on Earth, and are present in most of its habitats."

bio_str = "Biology is the science that studies life. What exactly is life? This may sound like a silly question with an obvious answer, but it is not easy to define life. For example, a branch of biology called virology studies viruses, which exhibit some of the characteristics of living entities but lack others. It turns out that although viruses can attack living organisms, cause diseases, and even reproduce, they do not meet the criteria that biologists use to define life."

问题生成

from text2text.text_generator import TextGenerator
qg = TextGenerator(output_type="question")

qg.predict([
            bio_str,
            bio_str,
            bio_str,
            bio_str,
            bio_str,
            "I will go to school today to take my math exam.",
            "I will go to school today to take my math exam.",
            "Tomorrow is my cousin's birthday. He will turn 24 years old.",
            notre_dame_str,
            bacteria_str,
            bacteria_str,
            bacteria_str,
            "I will go to school today to take my math exam. [SEP] school",
            "I will go to school today to take my math exam. [SEP] exam",
            "I will go to school today to take my math exam. [SEP] math",
          ])

生成的问题

注意,最后三个答案是通过在上面的输入中指定[SEP]标记来控制的。在

[('What is biology the science that studies?', 'life'),
 ('What is the study of life?', 'studies'),
 ('What would you find the question " life "?', 'sound'),
 ('What can viruses do to living organisms?', 'attack'),
 ('What is the study of life?', 'studies'),
 ('Where will I go to to take my math exam?', 'school'),
 ('Where will I go to to take my math exam?', 'school'),
 ("What will my cousin's birthday?", 'turn'),
 ('What type of oversight does The Observer not have?', 'editorial'),
 ('What shape can bacteria be found in?', 'rods'),
 ('What is the typical length of bacteria?', 'micrometres'),
 ('What is the typical length of bacteria?', 'micrometres'),
 ('Where will I go to to take my math exam?', 'school'),
 ('What will I take after school?', 'exam'),
 ('What exam will I take?', 'math')]

摘要生成

from text2text import TextGenerator
sg = TextGenerator(output_type="summary")
sg.predict([notre_dame_str, bacteria_str, bio_str])

生成摘要

["Notre Dame's students run nine student - run outlets . [X_SEP] Scholastic magazine claims to be the oldest continuous collegiate publication in the United States . [X_SEP] The Observer is an independent publication .",
 'Bacteria were among the first life forms to appear on Earth .',
 'biology is the science that studies life .']

有问题吗?在

有关使用Text2Text的问题或帮助,请提交GitHub问题。在

致谢

此包基于UniLM

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java从Dropwizard中的Minio检索文件时,GET请求中的超时是如何处理的?   带Hibernate的java Jackson用于序列化以避免枚举   Raspberry Pi上的java Jave分段错误   java在屏幕旋转时不保存当前片段和数据   java War文件未在Heroku上正确部署   如何使用Java处理Selenium webdriver中的促销广告或cookie   java处理“用法:PApplet[options]<classname>[sketch args]”   java文本文件错误扫描程序   运行第一个JavaFX模块化程序时出现java异常   java将fileoutputstream转换为字符串   如何调试gstreamerjava?   java Spring RestTemplate ResponseBody类是什么样的   如何将JSON数组转换为Java列表。我在用斯文森   javascript在显示div按钮后进入新页面