Applying Mixture of Experts Technique on Small-Size Language Models for Multi-Step Mathematical Reasoning Problems
- Zhiyi Chen (zhiychen)
- Yaqi Qin (yaqqin)
- Tianyang Xu (tianyxu)
- Daiwei Zhang (daizhang)
This repository contains the code and preprocessed data we used for project of the course CSNLP.
git clone https://github.com/sally-xu-42/CSNLP_MoE_MathReasoning.git
cd CSNLP_MoE_MathReasoning
mkdir csnlp_venv
python3 -m venv csnlp_venv
source csnlp_venv/bin/activate
module load gcc/8.2.0 python_gpu/3.8.5 eth_proxy
pip install -r requirements.txt
wandb login
The original data is adapted from the Socratic version of the GSM8K dataset. Methods in data_prep.py
are used to preprocessed and decomposed the training and testing datasets by the number of reasoning steps. We have two data formats: all preprocessed data in the first format is in the dataset
directory, where each problem is in the form of, for example
{"context": "c. main-q.", "subquestions": "q1? q2? q3?", "subanswers": "a1. a2. a3"}
data in the other format is in dataset_dialog
directory, which is formed as
{"context": "c", "main-q": "q", "qa-pairs":[["q1?", "a1."], ["q2?", "a2."], ...], "answer": "a"}
Note that all equations in the sub-answers are in form as a+b=<<a+b=c>>c
for convenient evaluation
Config file for train.py
is in configs/train.yaml
. Set repo_dir
to your working directory; set checkpoint_dir
to "checkpoints/"
if you want to save the trained model under the working directory, otherwise specify.
If you want run experiments with GPT-2 with 124M parameters, make sure
model: "gpt2"
dataset_path: "dataset"
If experimenting with DialoGPT, make
model: "microsoft/DialoGPT-small"
dataset_path: "dataset_dialog"
and most importantly, modify num_steps
in train.yaml
to a integer number from 2 to 8 to train a specific expert model or None to train a baseline mode.
If run locally, simply python train.py
; otherwise submit a job to the batch system with sbatch jobs/train
after updating your python venv path in the script.
For GPT-2 inference, simply modify checkpoint_path
and test_data_path
to the trained expert model and its corresponding dataset and run python eval_gpt2.py
. The output will be saved in a csv file and the final accuracy will be reported.
For DialoGPT GT inference described in the report, modify the three parameters curr_dir
, ckpt_pth
, data_pth
to your own paths and run python eval_dialogpt.py n
where n
represents the number of steps you want to evaluate.
For DialoGPT's iterative inference, please refer to this Colab notebook for interactive illustration.