/MMM-MCQA

Source code for our "MMM" paper at AAAI 2020

Primary LanguagePython

MMM-MCQA

Source code for our "MMM" paper at AAAI 2020: Jin, Di, Shuyang Gao, Jiun-Yu Kao, Tagyoung Chung, and Dilek Hakkani-tur. "MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension." AAAI (2020).. If you use the code, please cite the paper:

@article{jin2019mmm,
  title={MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension},
  author={Jin, Di and Gao, Shuyang and Kao, Jiun-Yu and Chung, Tagyoung and Hakkani-tur, Dilek},
  journal={arXiv preprint arXiv:1910.00458},
  year={2019}
}

Requirements

Python packages

  • Pytorch

Usage

  1. All five MCQA datasets are put in the folder "data" and to unzip the RACE data, run the following command:
tar -xf RACE.tar.gz
  1. To train the BERT model (including base and large versions), use the following command:
python run_classifier_bert_exe.py TASK_NAME MODEL_DIR BATCH_SIZE_PER_GPU GRADIENT_ACCUMULATION_STEPS

Here we explain each required argument in details:

  • TASK_NAME: It can be a single task or multiple tasks. If a single task, the options are: dream, race, toefl, mcscript, mctest160, mctest500, mnli, snli, etc. Multiple tasks can be any combinations of those above-mentioned single tasks. For example, if you want to train a multi-task model on the dream and race tasks together, then this variable should be set as "dream,race".
  • MODEL_DIR: Model would be initialized by the parameters stored in this directory.
  • BATCH_SIZE_PER_GPU: Batch size of data in a single GPU.
  • GRADIENT_ACCUMULATION_STEPS: How many steps to accumulate the gradients for one step of back-propagation.

One note: the effective batch size for training is important, which is the product of three variables: BATCH_SIZE_PER_GPU, NUM_OF_GPUs, and GRADIENT_ACCUMULATION_STEPS. In my experience, it should be at least higher than 12 and 24 would be great.

  1. To train the RoBERTa model (including base and large versions), use the following command:
python run_classifier_roberta_exe.py TASK_NAME MODEL_DIR BATCH_SIZE_PER_GPU GRADIENT_ACCUMULATION_STEPS
  1. To facilitate your use of this code, I provide the trained model parameters for some settings:
Model Type Fine-tune steps Download Links
BERT-Base MNLI,SNLI->DREAM,RACE Link
BERT-Large MNLI,SNLI->DREAM,RACE Link
RoBERTa-Large MNLI,SNLI->DREAM,RACE Link
BERT-Base MNLI,SNLI Link
BERT-Large MNLI,SNLI Link
RoBERTa-Large MNLI,SNLI Link
BERT-Large RACE Link
RoBERTa-Large RACE Link