pdf-cli-chatbot

Interact with any PDF file from the terminal without using Langchain or LlamaIndex. At times you do not need frameworks like Langchain, this is a demo of how you can build a simple CLI chatbot without relying on LLM frameworks.

Tech stack

Python Argparse for CLI
ChromaDB as vector database
OpenAI chatgpt turbo 3.5

Read the article

https://www.analyticsvidhya.com/blog/2023/09/how-to-build-a-pdf-chatbot-without-langchain/

Workflow

Getting Started

Prerequisites

Python 3.11+
OpenAI API key

Installation

Clone the repository:

git clone https://github.com/AnthonyRonning/pdf-cli-chatbot.git
cd pdf-cli-chatbot

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up your OpenAI API key:

cp .env.example .env
# Edit .env and add your OpenAI API key

Usage

1. Load a PDF into the chatbot:

python cli.py -f path/to/your/document.pdf

Optional: Customize chunk size (default is 200 words):

python cli.py -f document.pdf -v 300

2. Ask questions about the PDF:

python cli.py -q "What is the main topic of this document?"

Get more context by fetching multiple relevant chunks:

python cli.py -q "Explain the methodology used in the study" -n 5

3. Clear the collection (when switching PDFs):

python cli.py -c True

Example Workflow

# Load a PDF
python cli.py -f bitcoin.pdf

# Ask questions
python cli.py -q "What problem does Bitcoin solve?" -n 3
python cli.py -q "How does the proof-of-work system work?" -n 5

# Clear collection before loading a new PDF
python cli.py -c True

# Load a different PDF
python cli.py -f research_paper.pdf

# Ask questions about the new PDF
python cli.py -q "What are the key findings?" -n 3

Command Line Options

-f, --file: Path to the PDF file to load
-q, --question: Question to ask about the loaded PDF
-n, --number: Number of relevant chunks to retrieve (default: 1)
-v, --value: Words per chunk when processing PDF (default: 200)
-c, --clear: Clear the existing collection (use when switching PDFs)