This project is an experimental sandbox for testing out ideas related to running local Large Language Models (LLMs) with Ollama to perform Retrieval-Augmented Generation (RAG) for answering questions based on sample PDFs. In this project, we are also using Ollama to create embeddings with the nomic-embed-text to use with Chroma.
- Ollama verson 0.1.26 or higher.
- Clone this repository to your local machine.
- Create a Python virtual environment by running
python3 -m venv .venv
. - Activate the virtual environment by running
source .venv/bin/activate
on Unix or MacOS, or.\.venv\Scripts\activate
on Windows. - Install the required Python packages by running
pip install -r requirements.txt
.
Creates embeddings for the provided pdf sources: python3 setup.py -p <pdf_sources>
Spins an chat using the provided pdfs as sources: python3 app.py -p <pdf_sources>
Builds image and generate embeddings: sudo docker build -t langchain_rag:0.0.3 --build-arg OLLAMA_HOST=http://<ollama_instance>:11434 .
Starts a jupyter instance on port 5001, the notebook entrypoint allows interacting with the chat: sudo docker run --rm -e OLLAMA_HOST=http://<ollama_instance>:11434 --net host -it langchain_rag:0.0.3