The goal of this project is two folds, first is to create synthetic QA corpora for an unlabeled dataset for training custom QA models. Second goal is to develop general purpose question generation models for use cases like FAQ generation, test creation etc.

Specifically, 3 types of models are developed using the T5 model

  1. Single task answer aware question generation.
  2. Multi-task model to extract answer like spans, generate questions on those answer spans, and do QA
  3. End-to-End question generation (without proving explicit answer spans)

You can play with these models on HuggingFace model hub using the inference API.

All training details can be found in this wandb project

All of these models are trained using the unified text-to-text approach.

The multi-task model can do QA, QG and answer span extraction and it's performance is almost similar or better than it's single task counterpart. This is very useful for deployment, because usually question generation systems need three models,

  1. first model to extract answer like spans.
  2. second model to generate question for that answer and
  3. third, a QA model which will take the question and produce an answer.

then we can compare the two answers to see if the generated question is correct or not. Having 3 models for single problem is a lot of complexity so a multi-task model is definitely useful here.

Currently these model can generate only factoid question as they are trained using SQuAD dataset. So the next step is to find right dataset for generating non-factoid question and questions whose answers are not explicit part of original text.

Check the repo for more details.

Don't forget to tag @patil-suraj in your comment, otherwise they may not be notified.

Authors original post
Flutter | Deep Learning | Python | Web
Share this project
Similar projects
Q*BERT
Agents that build knowledge graphs and explore textual worlds by asking questions.
Semantic Graphs for Generating Deep Questions
Deep Question Generation (DQG), which aims to generate complex questions that require reasoning over multiple pieces of information of the input passage.
TeachEasy: Web app for Text Summarization & Q/A generation
An intuitive Streamlit based web app for Text Summarization and Question Answer generation so as to reduce the work for School teachers.
Automatically Generate Multiple Choice Questions (MCQs)
Automatically Generate Multiple Choice Questions (MCQs) from any content with BERT Summarizer, Wordnet, and Conceptnet
Top collections