This is the repo for the Llama-X, which aims to:
- Progressively improve the performance of LLaMA to SOTA LLM with open-source community.
- Conduct Llama-X as an open academic research which is long-term, systematic and rigorous.
- Save the repetitive work of community and we work together to create more and faster increment.
The project will follow these principles:
- We will publish all the
code
,model
,data
, andexperiments
details. - We will
continuously
improve the model version by version and open thenewest
method. - We will summary the method of each main version as
academic papers
. - We announce a complete research plan. The contributors are wellcome to cooperate with each other to progressively improve Llama-X through iteration of the target versions.
- The check-in of the new model must achieve significant improvement with current version on automatic evaluation.
📣 Please join if you are interested in Llama-X. Let's Make AI Open Again.
[1]. Research on Instruction Tuning
- instruction-following tuning
[2]. Research on RLHF & RLAIF
- fundamental RLHF
- AI learning from AI
[3]. Research on Data Quality
- high quality data for pre-training, fine-tuning, user feedbacks, multi-modality, etc
[4]. Research on Long Context Transformer
- enable efficient transformers for long sequence (>30k)
[5]. Research on Multi-modal (text + image) Modeling
- text + image in; text out
[6]. Research on Multilingual
- comparable multilingual performance with English
[7]. Research on Efficient infrastructure and optimization
- improve training and inference speed
- build deep learning stack which scales predictably
[8]. Research on Evaluation
- comprehensive evaluation of model capabilities
[9]. Research on Interpretability
- interpret the source of each capability of LLM
[10]. Research on LLM on Actions
- combine LLM with search, recommendation and other plugins
Llama-X | Baseline | Performance |
---|---|---|
3.0.0 (LLaMA) | GPT-3 | Outperform |
3.1.0 | text-davinci-001 | Comparable |
3.2.0 | text-davinci-002 | Comparable |
3.3.0 | text-davinci-003 | Comparable |
3.5.0 | gpt-35-turbo | Comparable |
3.6.0 | GPT-4 | 80% Avg.Gap |
3.7.0 | GPT-4 | 60% Avg.Gap |
3.8.0 | GPT-4 | 40% Avg.Gap |
3.9.0 | GPT-4 | 20% Avg.Gap |
4.0.0 | GPT-4 | Comparable |
We are focusing on the above research areas [1] & [3] now, and would public our first version of model (Llama-X 3.0.1) and paper before 4/9/2023.
Each new version of Llama-X model should significantly outperform (+>1%) the current version model on the automatic evaluation of all the following Type-A benchmarks. And the additional evaluation for Type-B benchmarks should be added in the 3.6.0+ versions:
Type | Benchmarks |
---|---|
A | MMLU |
A | HumanEval |
A | GSM-8K |
A | NaturalQuestions |
A | TruthfulQA |
B | Leetcode |
B | GRE |
B | AP |
B | MMLU-Multilingual |
B | Visual Inputs (TBD) |
- Setup. Install the conda environment:
conda create -n llamax python=3.10
conda activate llamax
git clone https://github.com/AetherCortex/Llama-X.git
cd Llama-X/src
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
cd ../..
pip install -r requirements.txt
- Training data example (e.g., Stanford Alpaca):
Llama-X/src/data/alpaca_data.json
- Convert LLaMa checkpoint to HuggingFace format:
cd Llama-X/src
python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir /path/to/llama-7B/ \
--model_size 7B \
--output_dir /path/to/llama-7B/hf
- Train LLaMA-7B on DeepSpeed Zero-3
deepspeed train.py \
--model_name_or_path /path/to/llama-7B/hf \
--data_path /path/to/example_data.json \
--output_dir /path/to/llama-7B/hf/ft \
--num_train_epochs 3 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 100 \
--save_total_limit 2 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "tensorboard" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True
- The current code of Llama-X support:
- Fully Finetune: Optimize full LLaMA checkpoint, instead of
Low-Rank Adaptation (LoRA)
. - High Efficiency: Training 7B model with
50k examples/epoch
&batch_size=64
within1 hour
on8 x V100 GPUs
.
- Fully Finetune: Optimize full LLaMA checkpoint, instead of
LLaMA | Batch Size | V100s | Time (h) |
---|---|---|---|
7 B | 64 | 8 | 1.00 |
13 B | 32 | 8 | 1.75 |
- Inference
# web demo inference
python generate.py
# batch inference
To Do
Developers can become Contributors by contributing helpful code, data, paper and computing resource, etc.
-
Code: Including algorithm implementation, training optimization, inference optimization, and model deployment.
-
Data: Every research area and version iteration requires high-quality data, including instruction-answer, pre-training, multi-modal, multilingual, and user feedbacks data, etc.
-
Paper: We will maintain a Llama-X Paper List, and use Llama-X as the base model for optimized, fully tested, and significantly improved academic papers. You can check in to the Llama X Paper List.
-
Computing resource: We hope to help accelerate model iteration speed by coordinating redundant computing power from some developers or non-profit sponsorship from universities/enterprises.
-
Github Issues
-
Email: llama-x@mail.com
This project has been inspired by multiple open source projects:
Huggingface Transformers Llama
Alpaca and Alpaca-LoRA
Llama-X now is only for the academic purpose, and please do not apply it to commercial scenarios or products.