YaRN: Efficient Context Window Extension of Large Language Models
This repo contains the code and data for the YaRN context window extension method.
Awaiting arXiv announcement, citation will go here!
We publish 7B and 13B variants of LLaMA 2 fine-tuned with YaRN at 64K and 128K context window length. They are available under the LLaMA 2 license on 🤗 Hugging Face.
Size | Context | Link |
---|---|---|
7B | 64K | NousResearch/Yarn-Llama-2-7b-64k |
7B | 128K | NousResearch/Yarn-Llama-2-7b-128k |
13B | 64K | NousResearch/Yarn-Llama-2-13b-64k |
13B | 128K | NousResearch/Yarn-Llama-2-13b-128k |
We strongly believe in open science, and thus publish all code and data to reproduce the results in our paper. To reproduce, clone the repository and perform a local installation.
git clone https://github.com/jquesnelle/yarn
cd yarn
pip install -e .
To train the models, run accelerate config
and enable DeepSpeed acceleration. deepspeed/zero3.json
was the configuration file used for training.
# ./train.sh
The tokenized training data is available on Hugging Face and was derived from the pg19 dataset.
To reproduce the evaluations, install lm-evaluation-harness with pip install git+https://github.com/EleutherAI/lm-evaluation-harness
and then run the two provided scripts.
# ./eval.sh
# ./eval-harness.sh