/pdf-cli-chatbot

Primary LanguagePythonMIT LicenseMIT

pdf-cli-chatbot

Interact with any PDF file from the terminal without using Langchain or LlamaIndex. At times you do not need frameworks like Langchain, this is a demo of how you can build a simple CLI chatbot without relying on LLM frameworks.

Tech stack

  1. Python Argparse for CLI
  2. ChromaDB as vector database
  3. OpenAI chatgpt turbo 3.5

Read the article

https://www.analyticsvidhya.com/blog/2023/09/how-to-build-a-pdf-chatbot-without-langchain/

Workflow

alt

Getting Started

Prerequisites

  • Python 3.11+
  • OpenAI API key

Installation

  1. Clone the repository:
git clone https://github.com/AnthonyRonning/pdf-cli-chatbot.git
cd pdf-cli-chatbot
  1. Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up your OpenAI API key:
cp .env.example .env
# Edit .env and add your OpenAI API key

Usage

1. Load a PDF into the chatbot:

python cli.py -f path/to/your/document.pdf

Optional: Customize chunk size (default is 200 words):

python cli.py -f document.pdf -v 300

2. Ask questions about the PDF:

python cli.py -q "What is the main topic of this document?"

Get more context by fetching multiple relevant chunks:

python cli.py -q "Explain the methodology used in the study" -n 5

3. Clear the collection (when switching PDFs):

python cli.py -c True

Example Workflow

# Load a PDF
python cli.py -f bitcoin.pdf

# Ask questions
python cli.py -q "What problem does Bitcoin solve?" -n 3
python cli.py -q "How does the proof-of-work system work?" -n 5

# Clear collection before loading a new PDF
python cli.py -c True

# Load a different PDF
python cli.py -f research_paper.pdf

# Ask questions about the new PDF
python cli.py -q "What are the key findings?" -n 3

Command Line Options

  • -f, --file: Path to the PDF file to load
  • -q, --question: Question to ask about the loaded PDF
  • -n, --number: Number of relevant chunks to retrieve (default: 1)
  • -v, --value: Words per chunk when processing PDF (default: 200)
  • -c, --clear: Clear the existing collection (use when switching PDFs)