# Fine-Tuning LLMs (Large Language Models) with Hugging Face
Description:
This guide equips you with step-by-step instructions on fine-tuning large language models (LLMs) using Hugging Face's powerful libraries and techniques. It leverages the transformers
and trl
libraries to demonstrate fine-tuning a pre-trained LLM for a specific task, enabling you to customize its capabilities beyond its initial functionality.
Prerequisites:
- Basic understanding of Python programming
- Familiarity with large language models and their applications
- Working knowledge of Hugging Face libraries (recommended but not essential)
Table of Contents:
- Introduction
- Installation and Setup
- Loading the Model and Tokenizer
- Configuring Training Arguments
- Creating the Fine-Tuning Trainer
- Training the Model
- Chatting with the Fine-Tuned Model
- Conclusion
Large language models (LLMs) are powerful tools capable of generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way. However, their out-of-the-box capabilities may not always be tailored to your specific needs. Fine-tuning allows you to customize an LLM for a particular task, enhancing its performance and relevance in your chosen domain.
This guide utilizes Hugging Face libraries to demonstrate fine-tuning a pre-trained LLM for a specific objective. We'll walk you through the steps of loading the model, preparing the training environment, and fine-tuning it on a given dataset. Finally, we'll explore how to interact with the fine-tuned model to experience its customized capabilities.
Before starting, ensure you have Python and pip installed on your system. Here's how to install the required libraries using pip:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
!pip install huggingface_hub
accelerate
: Provides optimization tools for training.peft
: Enables efficient training using Low-Rank Adaptation (LORA).bitsandbytes
: Offers quantization techniques for model optimization.transformers
: Core library for loading and working with pre-trained models from the Hugging Face ecosystem.trl
: Library for implementing Supervised Fine-Tuning (SFT) for fine-tuning pre-trained models.huggingface_hub
: Facilitates access to pre-trained models and datasets from the Hugging Face Hub.
This guide utilizes a pre-trained LLM named aboonaji/llama2finetune-v2
from the Hugging Face Hub. We'll load both the model and its corresponding tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# Load the model with quantization configuration for efficiency
llama_model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path="aboonaji/llama2finetune-v2",
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=getattr(torch, "float16"),
bnb_4bit_quant_type="nf4",
),
)
# Disable caching and adjust pre-training temperature (optional)
llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1
# Load the tokenizer
llama_tokenizer = AutoTokenizer.from_pretrained("aboonaji/llama2finetune-v2")
# Modify padding behavior (optional)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"
- We use
AutoModelForCausalLM.from_pretrained
to load the model, specifying the model name and thequantization_config
for performance optimization. config.use_cache
is set toFalse
to avoid potential memory issues during fine-tuning.config.pretraining_tp
is adjusted to a value of 1 (optional, explore documentation for details).- We use
AutoTokenizer.from_pretrained
to load the tokenizer
The next step involves defining training arguments using the TrainingArguments
class. These arguments control various aspects of the fine-tuning process, such as:
- Where to save training outputs (
output_dir
) - The number of training examples processed on each device per batch (
per_device_train_batch_size
) - The total number of training steps (
max_steps
)
from transformers import TrainingArguments
training_arguments = TrainingArguments(
output_dir="./results",
per_device_train_batch_size=4,
max_steps=100,
)
- We set
output_dir
to"./results"
to store the training results in a folder named "results". - We set
per_device_train_batch_size
to 4, indicating that 4 training examples will be processed on each device (e.g., GPU) per batch. - We set
max_steps
to 100, limiting the training process to 100 steps (adjust this based on your dataset size and desired training intensity).
Now, we'll create the SFTTrainer
object from the trl
library. This trainer orchestrates the fine-tuning process, handling elements like:
- The model (
model
) - The training arguments (
args
) - The training dataset (
train_dataset
) - The tokenizer (
tokenizer
) - The configuration for Low-Rank Adaptation (LORA) fine-tuning (
peft_config
) - The name of the text field in the dataset (
dataset_text_field
)
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
# Load the training dataset
train_dataset = load_dataset(path="aboonaji/wiki_medical_terms_llam2_format", split="train")
# Create the LORA fine-tuning configuration
lora_config = LoraConfig(task_type="CAUSAL_LM", r=64, lora_alpha=16, lora_dropout=0.1)
# Create the fine-tuning trainer
llama_sft_trainer = SFTTrainer(
model=llama_model,
args=training_arguments,
train_dataset=train_dataset,
tokenizer=llama_tokenizer,
peft_config=lora_config,
dataset_text_field="text",
)
- We utilize
load_dataset
to load the training dataset, assuming it's formatted according to the LLM's input requirements. - We create the
LoraConfig
object with parameters suitable for the task (adjust these based on your specific LLM and dataset). - Finally, we instantiate the
SFTTrainer
using the provided arguments and configurations.
Once all components are in place, you can initiate the fine-tuning process by calling the train()
method on the llama_sft_trainer
object:
# Start the fine-tuning process
llama_sft_trainer.train()
This code block will commence the fine-tuning, iterating over the training dataset and updating the model's parameters based on the provided data and the LORA configuration. The training duration depends on various factors, including dataset size, hardware capabilities, and chosen hyperparameters.
After successful fine-tuning, you can interact with the model to observe its learned capabilities:
from transformers import pipeline
# Define a user prompt
user_prompt = "Please tell me about Ascariasis"
# Create a text-generation pipeline
text_generation_pipeline = pipeline(
task="text-generation", model=llama_model, tokenizer=llama_tokenizer, max_length=300
)
# Generate a response to the user prompt
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
# Print the generated response
print(model_answer[0]["generated_text"])
- We define a sample prompt for the model to respond to.
- We create a text-generation pipeline using the fine-tuned model and tokenizer.
- We utilize the pipeline to generate a response text based on the user prompt, incorporating the special tokens
<s>
and[/INST]
to frame the prompt appropriately. - Finally, we print the generated response from the fine-tuned model.
This guide has provided a step-by-step process for fine-tuning a large language model