This is the repository for the paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources .
We explore instruction-tuning popular base models on publicly available datasets. This repository contains:
- Training code used for training all models.
- Evaluation code for the evaluation done in the paper.
- Script for merging and creating model diffs.
As part of this work we introduce Tülu, a suite of LLaMa models fully-finetuned on a strong mix of datasets!
Tülu 65B is the strongest model we build and available here - see below for how to make use of this model yourself!
You can install the required packages by running the following command (after installing pytorch):
pip install -r requirements.txt
If you just want the dependencies for the weight diff script, use:
pip install -r weight-diff-requirements.txt
To get LLaMa checkpoints, please acquire them via Meta here and consult the Hugging Face documentation for converting them to a huggingface-compatible format.
Generally, most huggingface-compatible models should work fine, potentially with some adjusting for different tokenizers etc.
We use a slightly modified form of the Alpaca weight diff script, which runs the same.
To merge a model:
- Download the relevant LLaMa model and convert it to Hugging Face format (see above).
- Download our repository and install the right dependencies (see above).
- Download the model diff you want.
- Run the command below:
python scripts/weight_diff.py recover --path_raw ${hf_llama_path} --path_tuned ${output_path} --path_diff ${diff_location}
To download and prepare the instruction datasets we explore, use:
./scripts/prepare_train_data.sh
Please check these datasets for licenses and restrictions around their use!
To run instruction tuning, you can use the following command:
./scripts/finetune_with_accelerate.sh
Adjust model_name_or_path
, tokenizer_name
, train_file
, and output_dir
to your models / data / setting. By default, this uses deepspeed
with accelerate
.
We provide a number of model checkpoints as diffs. You can find them on Hugging Face here. They are also all here:
Model | 7B | 13B | 30B | 65B |
---|---|---|---|---|
SuperNI | link | link | ||
CoT | link | link | ||
Flan V2 | link | link | ||
Dolly | link | link | ||
Open Assistant 1 | link | link | ||
ShareGPT | link | link | link | link |
Self-instruct (original) | link | link | ||
Unnatural Instructions | link | link | ||
Alpaca | link | link | ||
Code-Alpaca | link | link | ||
GPT4-Alpaca | link | link | ||
Baize | link | link | ||
Human-Mix | link | link | link | link |
Tulu | link | link | link | link |
We also trained Pythia and OPT models on the Tulu mixture (aka the Human+GPT mixture), and they are available here:
First, run the following script to download all the evaluation datasets:
./scripts/prepare_eval_data.sh
Evaluation scripts for different datasets are put under ./scripts
. For example, you can use the following command to run the MMLU evaluation script:
./scripts/eval/mmlu.sh
To run AlpacaFarm eval, please make sure you install our fork of AlpacaFarm (https://github.com/hamishivi/alpaca_farm) and use the following script:
python eval/alpaca_farm_eval.py --model <model> --batch_size 8
Please check the script for more details on the script itself!
Coming soon!
The is licensed under Apache 2.0 as given in LICENSE
.
The license we use for the models released (along with the base model licenses) can be found in model_licenses/tulu_license.txt
- just replace <MODELNAME>
with the actual model name (i.e., the name on HuggingFace).
If you used this repository or our models, please cite our work:
@misc{wang2023far,
title={How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources},
author={Yizhong Wang and Hamish Ivison and Pradeep Dasigi and Jack Hessel and Tushar Khot and Khyathi Raghavi Chandu and David Wadden and Kelsey MacMillan and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
year={2023},
eprint={2306.04751},
archivePrefix={arXiv},
primaryClass={cs.CL}
}