YaRN
This repo contains the code and data for the YaRN context window extension method.
Preprint
Preprint (arXiv): YaRN: Efficient Context Window Extension of Large Language Models
A list of mistakes caught by our readers are listed here (thank you!): Errata.md
v2 of the preprint will be published on arxiv with all the corrections.
Models
We publish 7B and 13B variants of Llama 2 fine-tuned with YaRN at 64K and 128K context window length. They are available under the Llama 2 license on 🤗 Hugging Face.
Size | Context | Link |
---|---|---|
7B | 64K | NousResearch/Yarn-Llama-2-7b-64k |
7B | 128K | NousResearch/Yarn-Llama-2-7b-128k |
13B | 64K | NousResearch/Yarn-Llama-2-13b-64k |
13B | 128K | NousResearch/Yarn-Llama-2-13b-128k |
Reproduction
We strongly believe in open science, and thus publish all code and data to reproduce the results in our paper. To reproduce, clone the repository and perform a local installation.
git clone https://github.com/jquesnelle/yarn
cd yarn
pip install -e .
Training
To train the models, run accelerate config
and enable DeepSpeed acceleration. deepspeed/zero3.json
was the configuration file used for training.
# ./train.sh
The tokenized training data is available on Hugging Face and was derived from the pg19 dataset.
Evaluation
To reproduce the evaluations, install lm-evaluation-harness with pip install git+https://github.com/EleutherAI/lm-evaluation-harness
and then run the two provided scripts.
# ./eval.sh
# ./eval-harness.sh
Citation
@misc{peng2023yarn,
title={YaRN: Efficient Context Window Extension of Large Language Models},
author={Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole},
year={2023},
eprint={2309.00071},
archivePrefix={arXiv},
primaryClass={cs.CL}
}