LlamaIndex_RAG_Memory

Running The UI code for Conversational Memory with RAG that cites sources

Note: The GIF demo is a real-time representation of the code execution on M2 Max and has not been sped up

Click here for a longer demo

Introduction

This repository is an example of how to create and use Retrieval Augmented Generation (RAG) with LlamaIndex. This is done using open-source models and does not require any API or paid service.
However, the RAG will work with paid APIs like GPT4-Turbo, just change the llm input in service_context

Here are the libraries used:

Vector Storage -> ChromaDB
Embedding Model -> BAAI/bge-small-en-v1.5 from HuggingFaceEmbeddings
LLM -> Mistral-11B-OmniMix the 4bit quantized GGUF version from TheBloke
This GGUF model is loaded with LlamaCPP, which results in solid perforamce
User Interface (UI) -> Chainlit

Vector Database and RAG with LlamaIndex

The create_VectorDB.ipynb notebook guides you through the process of creating a vector database using Chroma DB, which stores embeddings from Hugging Face's language models. This vector database is then used by the demo script for RAG.

The demo_RAG.ipynb notebook demonstrates how to utilize the created vector database to answer questions based on the documents it contains.

Part 1: Creating the Vector Database with ChromaDB and Hugging Face Embeddings

Use the create_vectorDB.ipynb to create the RAG_VectorDB

Download an example PDF from arXiv
Convert the PDF to LlamaIndex Documents
Convert Documents into LlamaIndex Nodes
Create and store the Vector DB

Part 2: Utilizing the Vector Database with an Open Source LLM Model

Run the demo_RAG.ipynb which will step you through 4 different examples:

Load the Foundational LLM and ask a question
Use the LLM with RAG from RAG_VectorDB
Conversational Memory with RAG and Sources

Performance