This repository contains the code for the BEA 2023 submission MultiQG-TI: Towards Question Generation from Multi-modal Sources
- pytorch
- pytorch_lightning
- transformers
- evaluate
- tensorflow
- sentence_transformers
- faiss
- openai
- Code for MultiQG using Flan-T5
QG-T5-train.py
: the main training scriptQG-T5-inference.py
: the main inference (generation) scriptQG-T5-scoring.py
: the main scoring script with the eval metrics
- Code for MultiQG using OpenAI API
QG-openai-inference.py
: the main inference (generation) scriptQG-openai-scoring.py
: the main scoring script
- Utilities
utils_model.py
: model utilities for Flan-T5utils_dataset.py
: dataset utilities, including preprocessing stepsutils.py
: other misc utilities including seeding and perplexity computation
- Scripts
utils_script_similarity.py
: script to compute the cosine similarity to select examples for ChatGPT few-shot in-context learningdownload_dataset.sh
: script to download the ScienceQA datasetgenerated_description_from_img.py
: script to generated description from imageextract_text_from_img.py
: script to extract the texts from the imageplot_results.ipynb
: notebook to visualize/reproduce selected results in the paper
- Produce the image descriptions
- We already included the generated image descriptions and the extracted texts in images in
/generated_descriptions
and/extracted_texts_img
, respectively - To produce the above on your own, you can run:
python extract_text_from_img.py
, andpython generate_description_from_img.py
- We already included the generated image descriptions and the extracted texts in images in
- Run the Flan-T5 train, generation, and scoring scripts:
python QG-T5-train.py
python QG-T5-inference.py
python QG-T5-scoring.py
- Please refer to each script for the many configurable options
- Run the OpenAI generation and scoring scripts
- Fill in the OpenAI API key in the
QG-openai-inference.py
script python QG-openai-inference.py
python QG-openai-scoring.py
- Fill in the OpenAI API key in the