This repository features a simple notebook which demonstrates how to use Unstructured to ingest and pre-process documents for a local Retrieval-Augmented-Generation (RAG) application
The goal of this repo is not use any cloud services or external APIs and to run everything locally. This demonstrates RAG applications can be built with siloed infrastructure.
-
This jupyter notebook will try to pre-process ALL files the "files_used" directory. The "files_used" directory in this repo contains a few PDFs related to the NFL.
-
If you don't use a GPU or some type of accelerator, generating embeddings and LLM generation WILL be slower.
-
Install Python. Please use version 3.9 or later.
-
Install Docker, Docker-Compose and Docker Desktop and make sure Docker Desktop is running. Installation instructions are here
-
Clone this repository by running the following command
git clone git@github.com:Unstructured-IO/local-RAG-demo.git
- CD into this repository locally, create a virtual environment, install the requirements
cd local-RAG-demo #enter local-RAG-demo directory
python3.10 -m venv env #create venv called env
source env/bin/activate #activate environment
pip install -r requirements.txt #install required packages
- Install and Download LLama2 CCP Model. This is slightly different for every OS, but here is a link with download instructions. Below is how to install + download on MAC.
mkdir model_files #make model files folder to store Llama 2 model files
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python #install llama-cpp-python package made for MAC Silicon chips
huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF --local-dir model_files --local-dir-use-symlinks False --include='*Q4_K*gguf' #download model
- Start Docker Container to Spin up Weaviate VectorDB
docker-compose up -d