This is the official repo for "Extending LLMs' Context Window with 100 Samples". Preprint
We introduce 'Entropy-Aware ABF' that supports efficient context window extension of RoPE-based LLMs with only 100 samples. The repository contains code and data to reproduce our model.
We release long-context Llama-2-7b-chat extended with our method trained with different data amounts on 🤗Hugging Face:
Data | Link |
---|---|
0.1k | 🤗eabf-llama2-7b-chat-0.1k |
1k | 🤗eabf-llama2-7b-chat-1k |
3.5k | 🤗eabf-llama2-7b-chat-3.5k |
We also release our training data on 🤗Hugging Face Datasets.
To use our code, your transformers library should be version 4.31 or higher.
We adopt the paper summarization test proposed in NTK-Aware scaling's blog to serve as a sanity check.
In short, to load the LLaMA model with our method, you should first import the required packages:
from transformers.models.llama.modeling_llama import LlamaForCausalLM
import patch.eabf as eabf
Then, you can load the model by using the right rope_scaling
argument and our monkey patching function:
model = LlamaForCausalLM.from_pretrained(MODEL_NAME_OR_PATH, ..., rope_scaling={"type": "eabf", "factor": 4})
eabf.apply_eabf(model)
Other RoPE-based LLMs might or might not follow the same attention scores pattern as Llama-2-7b-chat, we release our code for retrieving attention scores and computing the 'attention entropy' so that users can apply our method tailored to their model.