
Experimenting with RAG over my org-mode notes

Primary LanguagePython

Building a RAG Application with LangChain

Basic RAG with LangChain and Ollama

Environment Setup

I use my standard setup, with a dedicated conda environment for this project, specified with the direnv layout feature. Required libraries can be installed with:

pip3 install -r requirements.txt

Note that my .envrc is encrypted with git-crypt.

Using a Local LLM with Ollama

Ollama allows us to run open-source LLMs on local machines. This is useful for enhanced privacy since one’s private data is never shared with LLM providers.

Retrieval Augmented Generation

RAG is the process of augmenting an LLM prompt with relevant context drawn from one or more documents. Typically, documents are broken down into chunks (which may optionally overlap) of a certain size. These chunks are then split into tokens and converted into vectors of real numbers in a process called embedding. These vectors may be stored in a vector database or index. Later, a query may also be converted to a vector embedding which can be used to perform a similarity search against the index. The top matches may be retrieved and added to the context of an LLM prompt, along with the prompt for the model.

OrgModeDocumentStore class

This Python class organizes and wraps LangChain classes to provide a simplified RAG interface over org-mode documents.

from langchain_community.document_loaders import DirectoryLoader, UnstructuredOrgModeLoader
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.retrievers import ParentDocumentRetriever
import os

class OrgModeDocumentStore:
  def __init__(self, collection, directory, model="mixtral:latest",
               search_type="mmr", mmr_diversity=0.75,
               num_search_results=5, show_progress=False,
    self.collection = collection
    self.directory = directory
    if not os.path.exists(directory):
      raise RuntimeError(f"Directory {directory} does not exist.")

    self.index_directory = os.path.join(directory, ".chroma")
    if not os.path.exists(self.index_directory):

    self.loader = DirectoryLoader(directory, glob="**/*.org", use_multithreading=True,
                                  loader_kwargs={"mode": "single"})

    self.search_type = search_type
    self.k = num_search_results
    self.diversity = mmr_diversity

    self.model = model
    self.embeddings = OllamaEmbeddings(model=model, show_progress=show_progress)
    self.db = Chroma(collection_name=collection,

  def __repr__(self):
    return f"""

  # indexing management
  def load(self):
    "Loads all org-mode documents found under the given directory recursively."
    self.documents = self.loader.load()

  def add_documents(self, docs):
    "Adds the given docs to the Chroma vectorstore and returns the document ids."
    return self.db.add_documents(docs)

  def update_document(self, id, doc):
    "Updates the single document identified by the id."
    return self.db.update_document(id, doc)

  def update_documents(self, ids, docs):
    "Updates the documents identified by the given ids."
    return self.db.update_documents(ids, docs)

  def create_index(self):
    "Creates the index from the loaded documents. This should only be run once."
    if len(self.documents) > 0:
      print(f"Indexing {len(self.documents)} documents.")
      return self.add_documents(self.documents)

  # query
  def print_documents(self):
    "Print the list of all documents."
    for d in self.documents:

  def similarity_search(self, query):
    "Search the vectorstore for docs relevant to the query."
    return self.db.similarity_search(query, self.k)

  def mmr_search(self, query):
    "Executes max marginal relevance search for the query."
    return self.db.max_marginal_relevance_search(query, k=self.k, lambda_mult=self.diversity)

  def as_retriever(self):
    "Returns a retriever for this vectorstore."
    return self.db.as_retriever()

Loading and Indexing (Chunked) Documents

The Document Loader abstraction presents a unified interface for loading various file types, including plain text, Markdown, JSON, and more. The constructor identifies the documents to load, and the load() method does the actual work.

Splitting Documents into Chunks

Text Splitters break long documents into smaller chunks so we can pass them into an LLM context window.

Types of Splitters
splits on user-defined chars, keeps related chunks next to each other.
splits text on tokens
splits on user-defined chars
semantic chunker
splits on sentences, then combines adjacent ones if they are semantically similar enough
from orgstore import OrgModeDocumentStore
collection = "org-rag"
directory = "/Users/christian/Documents/personal/notes/content/roam/"
store = OrgModeDocumentStore(collection=collection, directory=directory, show_progress=True)
document_ids = store.create_index()
print(f"create_index: {document_ids}")

# data = zip(document_ids, store.documents)
# for id, doc in data:
#   print(f"{id}: {doc.metadata['source']}")


Use the vector store to find relevant documents.

from orgstore import OrgModeDocumentStore
collection = "org-rag"
directory = "/Users/christian/Documents/personal/notes/content/roam/"
store = OrgModeDocumentStore(collection=collection, directory=directory, silent_errors=True)

i, query = 1, ""
print("Enter search query at the prompt or type '?list' for docs, or '?quit' to exit.\n")
while not query.lower() == "?quit":
  query = input(f"{i}> ")
  if query == "?quit":
  elif query == "?list":
    i += 1
    i += 1
    #results = store.as_retriever().get_relevant_documents(query)
    #results = store.mmr_search(query)
    results = store.similarity_search(query)
    for doc in results:
      print(f"file: {doc.metadata['source']}, length: {len(doc.page_content)}")
      display = input("Display page content? (y|n)> ")
      if display.lower() == "y":
        print(f"content: {doc.page_content}\n" )
        print("-" * 80)

I’m not thrilled with these results. The chunks are very small and anecdotally not the most relevant. I’d like to feed more context to an LLM.