A LLaMA Megatron implement.
This repository is intended as a minimal, hackable and readable example with Nivida Megatron-LM to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form.
In a conda env with pytorch / cuda available, run:
pip install -r requirements.txt
# Install Nvidia APEX
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..
# Installing Megatron-LM
pip install --no-build-isolation git+https://github.com/MoFHeka/Megatron-LM.git
Then in this repository:
pip install -e .
Once your request is approved, you will receive links to download the tokenizer and model files.
Edit the download.sh
script with the signed url provided in the email to download the model weights and tokenizer.
LLaMA modeling code was rebuilt on the basis of Megatron, showing in llama_model.py
. Class LLAMAModel is the entry class.
tools/transform_huggingface_to_megatron.py
and tools/transform_huggingface_to_megatron.py
was provided for converting llama model ckpt between Huggingface and Megatron.
Firstly, we need to run tools/preprocess_data.py
to generate the Megatron style pretrain text dataset. Or we could write our own pretrain code like custom_pretrain_llama.py
with custom_training.py
.
The provided pretrain_llama.py
can be run on a single or multi-gpu node with torchrun
and will output completions for two pre-defined prompts. Using pretrain_llama_distributed.sh
to run it:
sh pretrain_llama_distributed.sh {dataset_folder} {ckpt_folder} {tokenizer_model} {tensorboard_folder} {tensor_parallel_size} {pipeline_parallel_size} {number_of_nodes}
Different models require different TP values:
Model | TP |
---|---|
7B | 1 |
13B | 2 |
33B | 4 |
65B | 8 |
LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971
@article{touvron2023llama,
title={LLaMA: Open and Efficient Foundation Language Models},
author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
journal={arXiv preprint arXiv:2302.13971},
year={2023}
}
See MODEL_CARD.md
See the LICENSE file.