Example of Conversational Memory with RAG that cites sources
Running The UI code for Conversational Memory with RAG that cites sources
Note: The GIF demo is a real-time representation of the code execution on M2 Max and has not been sped up
For running LLMs locally on Mac I perfer LlamaIndex, see that demo here
This repository is an example of how to create and use Retrieval Augmented Generation (RAG) with LangChain. This is done using open-source models and does not require any API or paid service Here are the libraries used:
- Vector Storage -> ChromaDB
- Embedding Model -> BAAI/bge-small-en-v1.5'model from HuggingFaceBgeEmbeddings
- LLM -> Mistral-11B-OmniMix the 4bit quantized GGUF version from TheBloke
- User Interface (UI) -> Chainlit
The create_vectorDB.ipynb
notebook guides you through the process of creating a vector database using Chroma DB, which stores embeddings from Hugging Face's language models. This vector database is then used by the demo script for RAG.
The demo_RAG.ipynb
notebook demonstrates how to utilize the created vector database to answer questions based on the documents it contains.
Use the create_vectorDB.ipynb
to create the LC_VectorDB
- Download an example PDF from arXiv
- Convert the PDF to LangChain Documents
- Prepare the documents by splitting the data
- Create and store the Vector DB
Run the demo_RAG.ipynb
which will step you through 4 different examples:
- Load the Foundational LLM and ask a question
- Use the LLM with RAG from LC_VectorDB
- Conversational Memory without RAG
- Conversational Memory with RAG and Sources
I developed this code on my M2 Max with 32GB of RAM. However, you can scale the embedding model and/or the LLM model to better match with your system. All of the necessary imports for Mac to utilize MPS are present in the notebooks.