This repo hosts the code for the paper, Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering, presented at EMNLP 2022.
Model: Our Rainier model is now on huggingface model hub! [policy] [value]
Usage: Please see Rainier's huggingface model card
Knowledge: We release the commonsense datasets augmented with Rainier-generated knowledge.
You can download the knowledge_rainier.json
file from our Google Drive folder.
Create and activate the Conda environment:
conda env create -f environment.yml
conda activate rainier
Install gsutil.
Download the Rainier model: Go to /model/
and run gdown 1qmxFTENNITA16_54dkqR6pHMDofa3Jee
Alternatively, you can download the rainier-large.pth
file from our Google Drive folder and put it under /model/
Download the UQA data: Go to /data/
and run python download_uqa.py
Download the non-UQA data: Go to /data/
and run gdown 1vfJQnqeRzr9MXPQmtbrAsQUuWZD1bZqF
Alternatively, you can download the non-uqa.zip
file from our Google Drive folder, put it under /data/
and unzip it. Make sure the 4 individual folders are directly under /data/
Running inference requires a GPU with at least 22G memory.
If that doesn't fit your memory, consider parallelizing on multiple GPUs, or using a smaller --batch_size
.
To run inference with the default setting, go to the /rainier/
directory and run
python main.py --mode eval
This will evaluate the dev split of all seen and unseen datasets, with Rainier-large as the knowledge introspector and UnifiedQA-large as the QA model.
You can view the output knowledge in /model/knowledge/
and the inference results in /model/inference/
.
Some flags you can set (see the full list in args.py
):
--eval_split [dev|test] The dataset split you want to evaluate. Some test data does not have gold labels so we provide utility scripts to convert the inference results to leaderboard submission files.
--eval_tasks [task-list] Please choose a subset from the full list (which is also the default value): obqa,arc_e,arc_h,ai2sci_e,ai2sci_m,csqa,qasc,piqa,siqa,wg,numersense,riddlesense,quartz,hellaswag. Write your choice as a comma-separated list.
--eval_baseline Additionally evaluate the no-knowledge baseline.
--eval_ckpt [path] The path to Rainier model ckpt. The default value is ../model/rainier-large.pth
--load_from_ckpt [path] This loads the Rainier model ckpt from a raw training ckpt file, and overrides the --ckpt parameter.
The Rainier model is trained in two stages.
We trained this stage using 1x RTX6000 GPU with 24G memory.
If you would like to skip this training stage, you can download a copy of our ckpt.
Go to /model/
and run gdown 1PeL3E7UreVIHKOkLNSyzgyAYoab-MA5N
Alternatively, you can download the rainier-large_stageI.pth
file from our Google Drive folder and put it under /model/
First, generate silver knowledge from GPT-3.
If you would like to use our pre-generated data, you can download a copy of our pre-generated knowledge.
Go to /data/
and run gdown 1V6Za8BfEwWa4xRgXcVEFhS8tWepHZPAw
Alternatively, you can download the knowledge_gkp.zip
file from our Google Drive folder, unzip it and put it under /data/
Otherwise, you can generate the knowledge yourself by going to the /rainier/
directory and run
sh generate_knowledge_gkp.sh
Remember to set the OPENAI_API_KEY
envvar beforehand, and be ready to spend a lot of money ;)
Then, you can start Stage I training by going to the /rainier/
directory and run
python imitation.py
This will train on all seen datasets, using silver knowledge as supervision.
You can track the training in Tensorboard.
The best model ckpt will be saved under /runs/imitation/
.
Make sure to run python extract_model_from_ckpt_stageI.py ../runs/imitation/[path-to-best].ckpt
before proceeding to the next stage.
This extracts the model state dict and puts it at /model/rainier-large_stageI.pth
We trained this stage using 8x RTX6000 GPUs, each has 24G memory.
To train Stage II with the default setting, go to the /rainier/
directory and run
python main.py --mode train
This will train Rainier on all seen datasets, with UnifiedQA-large as the QA model.
You can track the training in Tensorboard, and view the (dev set) output knowledge in /runs/[path-to-save-dir]/knowledge/
and the inference results in /runs/[path-to-save-dir]/inference/
.
Some flags you can set (see the full list in args.py
):
--train_tasks [task-list] Please choose a subset from the full list (which is also the default value): obqa,arc_e,arc_h,ai2sci_e,ai2sci_m,csqa,qasc,piqa,siqa,wg. Write your choice as a comma-separated list.
--eval_baseline Additionally evaluate the no-knowledge baseline.
--model_ckpt [path] The path to stage I model ckpt. The default value is ../model/rainier-large_stageI.pth
--load_from_ckpt [path] This resumes training from an existing ckpt.
Make sure to run python extract_model_from_ckpt_stageII.py --load_from_ckpt ../runs/[path-to-best].pth
after the training, so that you can use the trained Rainier model for inference.
If you find this repo useful, please cite our paper:
@article{Liu2022RainierRK,
title={Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering},
author={Jiacheng Liu and Skyler Hallinan and Ximing Lu and Pengfei He and Sean Welleck and Hannaneh Hajishirzi and Yejin Choi},
journal={ArXiv},
year={2022},
volume={abs/2210.03078},
url={https://api.semanticscholar.org/CorpusID:252735191}
}