问答+NLP中的问题生成

2024-06-02 06:47:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据集(大约3K到4K)excel文件,每个文件都有大约12K条记录,这些记录是FAQ、电子邮件对话、博客评论、聊天等的组合

最好的部分是,它有两个列,一个用于问题,另一个用于答案。你知道吗

excel中的一个示例记录-(注意-不能公开客户机数据,因此我只能自己创建一个记录来解释场景)。你知道吗

例如。 示例问题-What are IIT colleges in India?

示例答案-The Indian Institutes of Technology (IITs) are autonomous public institutes of higher education, located in India. They are governed by the Institutes of Technology Act, 1961 which has declared them as institutions of national importance and lays down their powers, duties, and framework for governance. The Institutes of Technology Act, 1961 lists twenty-three institutes.Each IIT is autonomous, linked to the others through a common council (IIT Council), which oversees their administration. The Minister of Human Resource Development is the ex officio Chairperson of the IIT Council. As of 2018, the total number of seats for undergraduate programs in all IITs is 11,279.

客户的要求是-

Generate as many as simple questions from (above sample answer) paragraph along with their answers and append it in the same excel.

(然后,他将进一步处理每个excel,将其输入到生成聊天机器人故事的某个工具中)。你知道吗

例如

  • 它是自主的吗?(回答:Yes
  • 什么支配着IIT?(回答:The Institutes of Technology Act, 1961
  • 它位于哪个国家?(回答:India
  • 1961年《技术学院法》列出了多少个学院?(回答:twenty-three) 等等

答案生成我可以用AllenAI,但不知道如何生成问题? 我尝试了一个repo,但它看起来不完整,需要更多的努力,因为我是NLP或ML的新手,所以不知道如何做这些更改。你知道吗

在问答中生成问题有什么帮助吗?你知道吗

我可以在现有的语言模型(比如spacy的模型)之上创建任何模型来生成实体,然后生成问题吗?你知道吗


Tags: ofthe答案in示例as记录excel