/open-instruct

Primary LanguagePythonApache License 2.0Apache-2.0

Training Open Instruction-following Language Models

This is the repository for the paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources .

We explore instruction-tuning popular base models on publicly available datasets. This repository contains:

  1. Training code used for training all models.
  2. Evaluation code for the evaluation done in the paper.
  3. Script for merging and creating model diffs.

As part of this work we introduce Tülu, a suite of LLaMa models fully-finetuned on a strong mix of datasets!

Tülu 65B is the strongest model we build and available here - see below for how to make use of this model yourself!

Setup

You can install the required packages by running the following command (after installing pytorch):

pip install -r requirements.txt

If you just want the dependencies for the weight diff script, use:

pip install -r weight-diff-requirements.txt

Model preparation

To get LLaMa checkpoints, please acquire them via Meta here and consult the huggingface documentation for converting them to a huggingface-compatible format.

Generally, most huggingface-compatible models should work fine, potentially with some adjusting for different tokenizers etc.

Weight Diff Script

We use a slightly modified form of the Alpaca weight diff script, which runs the same.

To merge a model:

  1. Download the relevant LLaMa model and convert it to huggingface format (see above).
  2. Download our repository and install the right dependencies (see above).
  3. Download the model diff you want.
  4. Run the command below:
python scripts/weight_diff.py recover --path_raw ${hf_llama_path} --path_tuned ${output_path} --path_diff ${diff_location}

Training

Dataset Preparation

To download and prepare the instruction datasets we explore, use:

./scripts/prepare_train_data.sh

Please check these datasets for licenses and restrictions around their use!

Finetuning

To run instruction tuning, you can use the following command:

./scripts/finetune_with_accelerate.sh

Adjust model_name_or_path, tokenizer_name, train_file, and output_dir to your models / data / setting. By default, this uses deepspeed with accelerate.

Model Checkpoints

We provide a number of model checkpoints as diffs. You can find them on huggingface here. They are also all here:

Model 7B 13B 30B 65B
SuperNI link link
CoT link link
Flan V2 link link
Dolly link link
Open Assistant 1 link link
ShareGPT link link link link
Self-instruct (original) link link
Unnatural Instructions link link
Alpaca link link
Code-Alpaca link link
GPT4-Alpaca link link
Baize link link
Human-Mix link link link link
Tulu link link link link

Pythia and OPT models (and more...?) coming soon!

Evaluation

First, run the following script to download all the evaluation datasets:

./scripts/prepare_eval_data.sh

Evaluation scripts for different datasets are put under ./scripts. For example, you can use the following command to run the MMLU evaluation script:

./scripts/eval/mmlu.sh

AlpacaFarm

To run AlpacaFarm eval, please make sure you install our fork of AlpacaFarm (https://github.com/hamishivi/alpaca_farm) and use the following script:

python eval/alpaca_farm_eval.py --model <model> --batch_size 8

Please check the script for more details on the script itself!

Human Evaluation Interface

Coming soon!

Citation

If you used this repository or our models, please cite our work:

@misc{wang2023far,
      title={How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources}, 
      author={Yizhong Wang and Hamish Ivison and Pradeep Dasigi and Jack Hessel and Tushar Khot and Khyathi Raghavi Chandu and David Wadden and Kelsey MacMillan and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
      year={2023},
      eprint={2306.04751},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}