Building a LangChain App with Fine-Tuned LLaMA 2 in Docker

This README provides instructions on building a LangChain-based application that interacts with a fine-tuned LLaMA 2 model. The guide covers setting up the environment, fine-tuning the model with QLoRA, creating a simple LangChain application, and running the app using Docker.

Requirements Python 3.7+ CUDA-compatible GPU (for model training and inference) Docker Hugging Face Transformers library PyTorch with CUDA support bitsandbytes LoRA (Low-Rank Adaptation) LangChain Streamlit Datasets library Setup To set up this project, ensure you have the necessary libraries installed. We recommend using a virtual environment to keep the dependencies organized.

Create a virtual environment

python -m venv langchain_venv

Activate the virtual environment

source langchain_venv/bin/activate # Linux/macOS

or

.\langchain_venv\Scripts\activate # Windows

Install the required libraries

pip install torch transformers datasets bitsandbytes peft langchain

Fine-Tuning the LLaMA 2 Model

This section describes fine-tuning the LLaMA 2 model using QLoRA and supervised fine-tuning (SFT) with a specified dataset.

Key Parameters for Fine-Tuning LoRA Parameters: Set the rank, alpha, and dropout for LoRA layers. bitsandbytes Parameters: Enable 4-bit precision for memory efficiency. Training Arguments: Configure training settings such as epochs, batch size, learning rate, and gradient checkpointing. Fine-Tuning Process Load the Dataset: Use load_dataset from datasets. Configure LoRA: Set up LoRA with the desired parameters. Create an SFT Trainer: Use SFTTrainer to manage the fine-tuning process. Train the Model: Execute trainer.train() to fine-tune the model. Save the Fine-Tuned Model: Save the fine-tuned model to disk. Docker Setup for Fine-Tuning You can use Docker to containerize the environment and ensure consistent setup across different platforms.

Dockerfile for Training

Build the Docker image

docker build -t llama2-finetuning .

Run the Docker container (with GPU support)

docker run --gpus all llama2-finetuning