LangChain_RAG_Memory

Example of Conversational Memory with RAG that cites sources

Running The UI code for Conversational Memory with RAG that cites sources

Note: The GIF demo is a real-time representation of the code execution on M2 Max and has not been sped up

For running LLMs locally on Mac I perfer LlamaIndex, see that demo here

Introduction

This repository is an example of how to create and use Retrieval Augmented Generation (RAG) with LangChain. This is done using open-source models and does not require any API or paid service Here are the libraries used:

Vector Storage -> ChromaDB
Embedding Model -> BAAI/bge-small-en-v1.5'model from HuggingFaceBgeEmbeddings
LLM -> Mistral-11B-OmniMix the 4bit quantized GGUF version from TheBloke
User Interface (UI) -> Chainlit

Vector Database and RAG with LangChain

The create_vectorDB.ipynb notebook guides you through the process of creating a vector database using Chroma DB, which stores embeddings from Hugging Face's language models. This vector database is then used by the demo script for RAG.

The demo_RAG.ipynb notebook demonstrates how to utilize the created vector database to answer questions based on the documents it contains.

Part 1: Creating the Vector Database with ChromaDB and Hugging Face Embeddings

Use the create_vectorDB.ipynb to create the LC_VectorDB

Download an example PDF from arXiv
Convert the PDF to LangChain Documents
Prepare the documents by splitting the data
Create and store the Vector DB

Part 2: Utilizing the Vector Database with an Open Source LLM Model

Run the demo_RAG.ipynb which will step you through 4 different examples:

Load the Foundational LLM and ask a question
Use the LLM with RAG from LC_VectorDB
Conversational Memory without RAG
Conversational Memory with RAG and Sources

Performance