/question-generation

EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain

Primary LanguagePython

EduQG

This repository contains the code for EduQG:

EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain

If you use part of the code/dataset please cite:

@misc{2210.06104,
Author = {Amir Hadifar and Semere Kiros Bitew and Johannes Deleu and Chris Develder and Thomas Demeester},
Title = {EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain},
Year = {2022},
Eprint = {arXiv:2210.06104},
}

The structure of dataset files (raw_data/qg_train_v0.json and raw_data/qg_valid_v0.json) is shown in the following figure. Each file is a list of chapters with various attributes such as: intro, chapter_text, bname, etc.

alt text

Pre-requisites

pip install -r requirements.txt

Train QG from scratch

 sh run_qg_exp.sh

Load answer-agnostic model:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("hadifar/openstax_qg_agno")

model = AutoModelForSeq2SeqLM.from_pretrained("hadifar/openstax_qg_agno")

Acknowledge

This code is based on repo from here.