LLaMA-Megatron

A LLaMA Megatron implement.

LLaMA

This repository is intended as a minimal, hackable and readable example with Nivida Megatron-LM to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form.

Setup

In a conda env with pytorch / cuda available, run:

pip install -r requirements.txt

# Install Nvidia APEX
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..

# Installing Megatron-LM
pip install --no-build-isolation git+https://github.com/MoFHeka/Megatron-LM.git

Then in this repository:

pip install -e .

Download

Once your request is approved, you will receive links to download the tokenizer and model files. Edit the download.sh script with the signed url provided in the email to download the model weights and tokenizer.

Model

LLaMA modeling code was rebuilt on the basis of Megatron, showing in llama_model.py. Class LLAMAModel is the entry class.

Checkpoint Transform

tools/transform_huggingface_to_megatron.py and tools/transform_huggingface_to_megatron.py was provided for converting llama model ckpt between Huggingface and Megatron.

Pretrain

Firstly, we need to run tools/preprocess_data.py to generate the Megatron style pretrain text dataset. Or we could write our own pretrain code like custom_pretrain_llama.py with custom_training.py.

The provided pretrain_llama.py can be run on a single or multi-gpu node with torchrun and will output completions for two pre-defined prompts. Using pretrain_llama_distributed.sh to run it:

sh pretrain_llama_distributed.sh {dataset_folder} {ckpt_folder} {tokenizer_model} {tensorboard_folder} {tensor_parallel_size} {pipeline_parallel_size} {number_of_nodes}

Different models require different TP values:

Model	TP
7B	1
13B	2
33B	4
65B	8

Reference

LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

Model Card

See MODEL_CARD.md

License

See the LICENSE file.

MoFHeka/LLaMA-Megatron