🥳 Platypus

8/11/23: Expect an update on our full coding pipeline within the next week!

The Platypus models are a series of fine-tuned variants based on the LLaMA and LLaMa 2 transformer architectures. Platty takes advantage of LoRA.

All models available via HuggingFace: garage-bAInd

CLI

Fastchat provides a simple setup for those interested in running the model. Afrer downloading the model through HuggingFace, clone the Fastchat repository:

git clone https://github.com/lm-sys/FastChat.git
cd FastChat

Download the required packages:

pip3 install --upgrade pip  # enable PEP 660 support
pip3 install -e .

Finally, run the following:

python3 -m fastchat.serve.cli --model-path garage-bAInd/Platypus-30B --conv_template alpaca

Local Setup

This repository is multi-GPU friendly, and provides code to use model or data parellelism, depending on your computational resources.

Install dependencies
```
pip install -r requirements.txt
```
Be sure to use these exact requirements or you may run into model saving or OOM issues.

Fine-tuning (`finetune.py`)

Run fine-tuning.sh.

Note: The script above uses torchrun for data parallelism. PyTorch is not in requirements.txt since technically you can run fine-tuning without it. To use fine-tuning.sh, please install PyTorch. We recommend using torchrun and PyTorch 2.0+ for speed + torch.compile. If you do not install pytorch, please take time to comment out any torch related lines in the scirpts.

Hyperparameters used to fine-tune Platypus-30B:

Hyperparameter	Value
learning rate	4e-4
batch size	128
microbatch size	8
warmup ratio	0.03
epochs	1
weight decay	0.
lr scheduler	cosine
lora alpha	16
lora rank	16
lora dropout	0.05
lora target modules	q_proj,k_proj,v_proj,o_proj
cutoff length	2048
train on inputs	False
group by length	False
add eos token	False

Gradient accumulation steps = global_batch_size / micro_batch_size / num_gpus = 128 / 8 / 4 = 4.

If your model cannot fit on the memory of each GPU, please use the alternative fine-tuning option below to take advantage of model parallelism.

python finetune.py \
    --base_model './llama30B_hf' \
    --data_path './train_final.json' \
    --output_dir './Platypus-30b' \
    --batch_size 128 \
    --micro_batch_size 16 \
    --num_epochs 1 \
    --learning_rate 4e-4 \
    --cutoff_len 2048 \
    --val_set_size 0 \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,k_proj,v_proj,o_proj]' \
    --train_on_inputs False \
    --group_by_length False

Reproducing Evaluation Results

Install LM Evaluation Harness:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
git checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463 # The commit used by the Open LLM Leaderboard
pip install -e .

Each task was evaluated on a single A100 80GB GPU.

ARC:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25

HellaSwag:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/hellaswag_10shot.json --device cuda --num_fewshot 10

MMLU:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/mmlu_5shot.json --device cuda --num_fewshot 5

TruthfulQA:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/truthfulqa_0shot.json --device cuda

Inference (`inference.py`)

Run inference using a csv or json file. Inference commands follow the same structure noted above for fine-tuning.

Checkpoint export (`export_*_checkpoint.py`)

These files contain scripts that merge the LoRA weights back into the base model for export to HuggingFace format and to PyTorch state_dicts. They should help users who want to run inference in projects like llama.cpp or alpaca.cpp.

aymenbelarbi/Platypus

🥳 Platypus

CLI

Local Setup

Fine-tuning (finetune.py)

Reproducing Evaluation Results

Inference (inference.py)

Checkpoint export (export_*_checkpoint.py)

Fine-tuning (`finetune.py`)

Inference (`inference.py`)

Checkpoint export (`export_*_checkpoint.py`)