Chat With PDFs

Chat with your PDF files for free, using Langchain, Groq, Chroma vector store, and Jina AI embeddings. This repository contains a simple Python implementation of the RAG (Retrieval-Augmented-Generation) system. The RAG model is used to retrieve relevant chunks of the user PDF file based on user queries and provide informative responses.

Installation

Follow these steps:

Clone the repository

git clone https://github.com/S4mpl3r/chat-with-pdf.git

Create a virtual environment and activate it (optional, but highly recommended).

python -m venv .venv
Windows: .venv\Scripts\activate
Linux: source .venv/bin/activate

Install required packages:

python -m pip install -r requirements.txt

Create a .env file in the root of the project and populate it with the following keys. You'll need to obtain your api keys:
```
JINA_API_KEY=<YOUR KEY>
GROQ_API_KEY=<YOUR KEY>
HF_TOKEN=<YOUR TOKEN>
HF_HOME=<PATH TO STORE HUGGINGFACE MODEL>
```
Run the program:
```
python main.py
```

Configuration

You can customize the behavior of the system by modifying the constants and parameters in the main.py file:

EMBED_MODEL_NAME: Specify the name of the Jina embedding model to be used.
LLM_NAME: Specify the name of the language model (Refer to Groq for the list of available models).
LLM_TEMPERATURE: Set the temperature parameter for the language model.
CHUNK_SIZE: Specify the maximum chunk size allowed by the embedding model.
DOCUMENT_DIR: Specify the directory where PDF documents are stored.
VECTOR_STORE_DIR: Specify the directory where vector embeddings are stored.
COLLECTION_NAME: Specify the name of the collection for the chroma vector store.

Resources

Kudos to the amazing libraries and services listed below:

License

MIT