Build and control your own LLMs
xturing
provides fast, efficient and simple fine-tuning of LLMs, such as LLaMA, GPT-J, GPT-2,
OPT, Cerebras-GPT, Galactica, and more.
By providing an easy-to-use interface for personalizing LLMs to your own data and application,
xTuring makes it simple to build and control LLMs.
The entire process can be done inside your computer or in your private cloud,
ensuring data privacy and security.
With xturing
you can,
- Ingest data from different sources and preprocess them to a format LLMs can understand
- Scale from single to multiple GPUs for faster fine-tuning
- Leverage memory-efficient techniques (i.e. INT4, LoRA fine-tuning) to reduce your hardware costs by up to 90% of the time
- Explore different fine-tuning methods and benchmark them to find the best performing model
- Evaluate fine-tuned models on well-defined metrics for in-depth analysis
🌟 New feature - INT4 fine-tuning with LLaMA LoRA
We are excited to announce the latest enhancement to our xTuring
library: INT4 fine-tuning demo. With this update, you can fine-tune LLMs like LLaMA with LoRA architecture in INT4 precision with less than 6GB
of VRAM. This breakthrough significantly reduces memory requirements and accelerates the fine-tuning process, allowing you to achieve state-of-the-art performance with less computational resources.
More information about INT4 fine-tuning and benchmarks can be found in the INT4 README.
You can check out the LLaMA INT4 fine-tuning example to see how it works.
CLI playground
UI playground
⚙️ Installation
pip install xturing
🚀 Quickstart
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
# Load the dataset
instruction_dataset = InstructionDataset("./alpaca_data")
# Initialize the model
model = BaseModel.create("llama_lora")
# Finetune the model
model.finetune(dataset=instruction_dataset)
# Perform inference
output = model.generate(texts=["Why LLM models are becoming so important?"])
print("Generated output by the model: {}".format(output))
You can find the data folder here.
📚 Tutorials
- Preparing your dataset
- Cerebras-GPT fine-tuning with LoRA and INT8
- Cerebras-GPT fine-tuning with LoRA
- LLaMA with LoRA and INT8
- LLaMA with LoRA
- LLaMA easy fine-tuning
- GPT-J efficient fine-tuning with LoRA and INT8
- GPT-J fine-tuning with LoRA
- Galactica fine-tuning with LoRA and INT8
- Galactica fine-tuning with LoRA
- OPT fine-tuning with LoRA and INT8
- OPT fine-tuning with LoRA
- GPT-2 fine-tuning with LoRA
📊 Performance
Here is a comparison for the performance of different fine-tuning techniques on the LLaMA 7B model. We use the Alpaca dataset for fine-tuning. The dataset contains 52K instructions.
Hardware:
4xA100 40GB GPU, 335GB CPU RAM
Fine-tuning parameters:
{
'maximum sequence length': 512,
'batch size': 1,
}
LLaMA 7B | DeepSpeed + CPU Offloading | LoRA + DeepSpeed | LoRA + DeepSpeed + CPU Offloading |
---|---|---|---|
GPU | 33.5 GB | 23.7 GB | 21.9 GB |
CPU | 190 GB | 10.2 GB | 14.9 GB |
Time per epoch | 21 hours | 20 mins | 20 mins |
Please submit your performance results on other GPUs.
📎 Fine-tuned model checkpoints
We have already fine-tuned some models that you can use as your base or start playing with. Here is how you would load them:
from xturing.models import BaseModel
model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")
model | dataset | Path |
---|---|---|
DistilGPT-2 LoRA | alpaca | x/distilgpt2_lora_finetuned_alpaca |
LLaMA LoRA | alpaca | x/llama_lora_finetuned_alpaca |
📈 Roadmap
- Support for LLaMA, GPT-J, GPT-2, OPT, Cerebras-GPT, Galactica and Bloom models
- Dataset generation using self-instruction
- 2x more memory-efficient fine-tuning vs LoRA and unsupervised fine-tuning
- INT8 low-precision fine-tuning support
- Supports OpenAI, Cohere and AI21 Studio model APIs for dataset generation
- Added fine-tuned checkpoints for some models to the hub
- INT4 LLaMA LoRA fine-tuning demo
- Evaluation of LLM models
- Support for Stable Diffusion
🤝 Help and Support
If you have any questions, you can create an issue on this repository.
You can also join our Discord server and start a discussion in the #xturing
channel.
📝 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
🌎 Contributing
As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.