This repository provides scripts for pretraining, fine-tuning, and evaluating transformer models on datasets such as CoLA, SST-2, QQP, MRPC, QNLI, RTE, IMDB, and Yelp.
Install dependencies with:
pip install -r requirements.txt
To pretrain a model, use the train.py
script:
python train.py --model_size {1m|33m} --num_generations 40 --num_stories 1000000 --batch_size 2000
--model_size
: Choose "1m" or "33m".--num_generations
,--num_stories
,--batch_size
: Configure the pretraining.
Use run_experiment.py
to fine-tune and evaluate models:
python run_experiment.py --dataset <dataset_name> --fine_tune_dir <dir> --eval_results_dir <dir> --pretrained_models_dir <dir> --tokenizer_path <path> --num_generations 40
--dataset
: Dataset to use (cola
,sst2
,qqp
, etc.).--fine_tune_dir
,--eval_results_dir
,--pretrained_models_dir
,--tokenizer_path
: Specify paths.--num_generations
: Number of fine-tuning generations.
To evaluate the perplexity (PPL) of models across generations and plot the results:
-
Run the evaluation for both 1m and 33m models by executing the following in your terminal:
python evaluate_ppl.py
-
Plot the PPL comparison: The
evaluate_ppl.py
script will automatically generate a plot saved asPPL_comparison.png
. This plot shows the PPL across generations for both model sizes, allowing you to visually compare their performance.
To automate fine-tuning across all datasets:
python run_downstreamtasks.py
Or use the shell script for a single experiment:
bash run_fintune.sh