Chat With PDF: Retrieval-Augmented Generation (RAG) Implementation

This repository contains code that demonstrates how to interact with a PDF document using Retrieval-Augmented Generation (RAG). The code loads a PDF, processes it into smaller chunks, stores the embeddings in a vector store, and performs similarity searches to retrieve relevant document chunks based on a query. It then formats the context and question into a prompt and generates a response using OpenAI's language model.

Overview

The goal of this implementation is to handle large PDF documents efficiently and provide accurate answers based on the content of the PDF. The main steps involved are:

Loading and Ingesting Data: Load a PDF document and split it into smaller chunks.
Preprocessing: Tokenize and preprocess the text for embedding generation.
Vector Embeddings: Create and store vector embeddings for document chunks.
Similarity Search: Perform similarity searches to find relevant document chunks.
RAG Enriched Prompt: Create prompts with retrieved context and generate answers using an LLM (Language Learning Model).

Credit: phdata

Setup

To get started, ensure you have the necessary dependencies installed. You can install them using the following commands:

pip install openai langchain chromadb tiktoken
pip install -U langchain-community
pip install pypdf

Code Walkthrough

Load and Ingest Data

Load a PDF document using PyPDFLoader and split it into smaller chunks.

from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader('/content/attentionisallyouneed.pdf')
pages = loader.load()

Preprocessing

Split the document into smaller chunks and tokenize the text.

from langchain.text_splitter import RecursiveCharacterTextSplitter
import tiktoken

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(pages)

encoding = tiktoken.encoding_for_model("text-embedding-3-small")
doc_tokens = [len(encoding.encode(page.page_content)) for page in docs]
total_tokens = sum(doc_tokens)
cost = total_tokens * 0.0004
print(f"Total tokens: {total_tokens}")
print(f"Cost: ${cost}")

Vector Embeddings

Create vector embeddings for the document chunks and store them using Chroma.

from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

embedding_function = OpenAIEmbeddings(model="text-embedding-3-small", api_key=openai.api_key)
db = Chroma.from_documents(documents=docs, embedding=embedding_function, persist_directory='my-embeddings')

Similarity Search

Perform a similarity search to retrieve relevant document chunks based on a query.

results = db.similarity_search_with_relevance_scores('What are self attention?', k=5)

for (doc, score) in results:
    print('score', score)
    print(doc)
    print('-------------------')

RAG Enriched Prompt

Create a prompt with the retrieved context and generate an answer using OpenAI's language model.

from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chat_models import ChatOpenAI

question = 'Please give me an introduction to transformer architecture'
context_docs = db.similarity_search(question, k=5)

prompt = PromptTemplate(
    template="""Use the following pieces of context to answer the question at the end. If you don't know the answer, just say don't know. Do not try to make up the answer.

    <context>
    {context}
    </context>

    Question: {question}
    Helpful Answer""",
    input_variables=["context", "question"]
)

llm = ChatOpenAI(model='gpt-4o-mini', temperature=0.9, api_key=openai.api_key)
qa_chain = LLMChain(llm=llm, prompt=prompt)

result = qa_chain({
    'question': question,
    'context': "\n".join([doc.page_content for doc in context_docs])
})

print(result)

Credits

This implementation is inspired by the blog post on Retrieval-Augmented Generation (RAG) by phData.

Saba-Gul/Chat-With-PDF-Retrieval-Augmented-Generation-RAG-Implementation