Local LLM with RAG

This project is an experimental sandbox for testing out ideas related to running local Large Language Models (LLMs) with Ollama to perform Retrieval-Augmented Generation (RAG) for answering questions based on sample PDFs. In this project, we are also using Ollama to create embeddings with the nomic-embed-text to use with Chroma.

Requirements

Ollama verson 0.1.26 or higher.

Setup

Clone this repository to your local machine.
Create a Python virtual environment by running python3 -m venv .venv.
Activate the virtual environment by running source .venv/bin/activate on Unix or MacOS, or .\.venv\Scripts\activate on Windows.
Install the required Python packages by running pip install -r requirements.txt.

Running the Project

Creates embeddings for the provided pdf sources: python3 setup.py -p <pdf_sources>

Spins an chat using the provided pdfs as sources: python3 app.py -p <pdf_sources>

Dockerized setup

Builds image and generate embeddings: sudo docker build -t langchain_rag:0.0.3 --build-arg OLLAMA_HOST=http://<ollama_instance>:11434 .

Starts a jupyter instance on port 5001, the notebook entrypoint allows interacting with the chat: sudo docker run --rm -e OLLAMA_HOST=http://<ollama_instance>:11434 --net host -it langchain_rag:0.0.3

Technologies Used