# Fine-Tuning LLMs (Large Language Models) with Hugging Face

Description:

This guide equips you with step-by-step instructions on fine-tuning large language models (LLMs) using Hugging Face's powerful libraries and techniques. It leverages the transformers and trl libraries to demonstrate fine-tuning a pre-trained LLM for a specific task, enabling you to customize its capabilities beyond its initial functionality.

Prerequisites:

  • Basic understanding of Python programming
  • Familiarity with large language models and their applications
  • Working knowledge of Hugging Face libraries (recommended but not essential)

Table of Contents:

  1. Introduction
  2. Installation and Setup
  3. Loading the Model and Tokenizer
  4. Configuring Training Arguments
  5. Creating the Fine-Tuning Trainer
  6. Training the Model
  7. Chatting with the Fine-Tuned Model
  8. Conclusion

1. Introduction

Large language models (LLMs) are powerful tools capable of generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way. However, their out-of-the-box capabilities may not always be tailored to your specific needs. Fine-tuning allows you to customize an LLM for a particular task, enhancing its performance and relevance in your chosen domain.

This guide utilizes Hugging Face libraries to demonstrate fine-tuning a pre-trained LLM for a specific objective. We'll walk you through the steps of loading the model, preparing the training environment, and fine-tuning it on a given dataset. Finally, we'll explore how to interact with the fine-tuned model to experience its customized capabilities.

2. Installation and Setup

Before starting, ensure you have Python and pip installed on your system. Here's how to install the required libraries using pip:

!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
!pip install huggingface_hub
  • accelerate: Provides optimization tools for training.
  • peft: Enables efficient training using Low-Rank Adaptation (LORA).
  • bitsandbytes: Offers quantization techniques for model optimization.
  • transformers: Core library for loading and working with pre-trained models from the Hugging Face ecosystem.
  • trl: Library for implementing Supervised Fine-Tuning (SFT) for fine-tuning pre-trained models.
  • huggingface_hub: Facilitates access to pre-trained models and datasets from the Hugging Face Hub.

3. Loading the Model and Tokenizer

This guide utilizes a pre-trained LLM named aboonaji/llama2finetune-v2 from the Hugging Face Hub. We'll load both the model and its corresponding tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# Load the model with quantization configuration for efficiency
llama_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path="aboonaji/llama2finetune-v2",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=getattr(torch, "float16"),
        bnb_4bit_quant_type="nf4",
    ),
)

# Disable caching and adjust pre-training temperature (optional)
llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1

# Load the tokenizer
llama_tokenizer = AutoTokenizer.from_pretrained("aboonaji/llama2finetune-v2")

# Modify padding behavior (optional)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"
  • We use AutoModelForCausalLM.from_pretrained to load the model, specifying the model name and the quantization_config for performance optimization.
  • config.use_cache is set to False to avoid potential memory issues during fine-tuning.
  • config.pretraining_tp is adjusted to a value of 1 (optional, explore documentation for details).
  • We use AutoTokenizer.from_pretrained to load the tokenizer

4. Configuring Training Arguments

The next step involves defining training arguments using the TrainingArguments class. These arguments control various aspects of the fine-tuning process, such as:

  • Where to save training outputs (output_dir)
  • The number of training examples processed on each device per batch (per_device_train_batch_size)
  • The total number of training steps (max_steps)
from transformers import TrainingArguments

training_arguments = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=4,
    max_steps=100,
)
  • We set output_dir to "./results" to store the training results in a folder named "results".
  • We set per_device_train_batch_size to 4, indicating that 4 training examples will be processed on each device (e.g., GPU) per batch.
  • We set max_steps to 100, limiting the training process to 100 steps (adjust this based on your dataset size and desired training intensity).

5. Creating the Fine-Tuning Trainer

Now, we'll create the SFTTrainer object from the trl library. This trainer orchestrates the fine-tuning process, handling elements like:

  • The model (model)
  • The training arguments (args)
  • The training dataset (train_dataset)
  • The tokenizer (tokenizer)
  • The configuration for Low-Rank Adaptation (LORA) fine-tuning (peft_config)
  • The name of the text field in the dataset (dataset_text_field)
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset

# Load the training dataset
train_dataset = load_dataset(path="aboonaji/wiki_medical_terms_llam2_format", split="train")

# Create the LORA fine-tuning configuration
lora_config = LoraConfig(task_type="CAUSAL_LM", r=64, lora_alpha=16, lora_dropout=0.1)

# Create the fine-tuning trainer
llama_sft_trainer = SFTTrainer(
    model=llama_model,
    args=training_arguments,
    train_dataset=train_dataset,
    tokenizer=llama_tokenizer,
    peft_config=lora_config,
    dataset_text_field="text",
)
  • We utilize load_dataset to load the training dataset, assuming it's formatted according to the LLM's input requirements.
  • We create the LoraConfig object with parameters suitable for the task (adjust these based on your specific LLM and dataset).
  • Finally, we instantiate the SFTTrainer using the provided arguments and configurations.

6. Training the Model

Once all components are in place, you can initiate the fine-tuning process by calling the train() method on the llama_sft_trainer object:

# Start the fine-tuning process
llama_sft_trainer.train()

This code block will commence the fine-tuning, iterating over the training dataset and updating the model's parameters based on the provided data and the LORA configuration. The training duration depends on various factors, including dataset size, hardware capabilities, and chosen hyperparameters.

7. Chatting with the Fine-Tuned Model

After successful fine-tuning, you can interact with the model to observe its learned capabilities:

from transformers import pipeline

# Define a user prompt
user_prompt = "Please tell me about Ascariasis"

# Create a text-generation pipeline
text_generation_pipeline = pipeline(
    task="text-generation", model=llama_model, tokenizer=llama_tokenizer, max_length=300
)

# Generate a response to the user prompt
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")

# Print the generated response
print(model_answer[0]["generated_text"])
  • We define a sample prompt for the model to respond to.
  • We create a text-generation pipeline using the fine-tuned model and tokenizer.
  • We utilize the pipeline to generate a response text based on the user prompt, incorporating the special tokens <s> and [/INST] to frame the prompt appropriately.
  • Finally, we print the generated response from the fine-tuned model.

8. Conclusion

This guide has provided a step-by-step process for fine-tuning a large language model