This README provides instructions on building a LangChain-based application that interacts with a fine-tuned LLaMA 2 model. The guide covers setting up the environment, fine-tuning the model with QLoRA, creating a simple LangChain application, and running the app using Docker.
Requirements Python 3.7+ CUDA-compatible GPU (for model training and inference) Docker Hugging Face Transformers library PyTorch with CUDA support bitsandbytes LoRA (Low-Rank Adaptation) LangChain Streamlit Datasets library Setup To set up this project, ensure you have the necessary libraries installed. We recommend using a virtual environment to keep the dependencies organized.
python -m venv langchain_venv
source langchain_venv/bin/activate # Linux/macOS
.\langchain_venv\Scripts\activate # Windows
pip install torch transformers datasets bitsandbytes peft langchain
This section describes fine-tuning the LLaMA 2 model using QLoRA and supervised fine-tuning (SFT) with a specified dataset.
Key Parameters for Fine-Tuning LoRA Parameters: Set the rank, alpha, and dropout for LoRA layers. bitsandbytes Parameters: Enable 4-bit precision for memory efficiency. Training Arguments: Configure training settings such as epochs, batch size, learning rate, and gradient checkpointing. Fine-Tuning Process Load the Dataset: Use load_dataset from datasets. Configure LoRA: Set up LoRA with the desired parameters. Create an SFT Trainer: Use SFTTrainer to manage the fine-tuning process. Train the Model: Execute trainer.train() to fine-tune the model. Save the Fine-Tuned Model: Save the fine-tuned model to disk. Docker Setup for Fine-Tuning You can use Docker to containerize the environment and ensure consistent setup across different platforms.
docker build -t llama2-finetuning .
docker run --gpus all llama2-finetuning