/scratch-pdf-bot

Prototyping a question and answer bot over PDFs

Primary LanguagePython

GPT PDF Chatbot

This is a prototype of a chatbot that can answer questions about PDFs. It uses OpenAI's API for language modeling, and LanceDB for vector storage and retrieval.

Setup

This uses Poetry for dependency management. To install dependencies:

$ poetry install

You'll also need to create a .env file and add your OPENAI_API_KEY to it (see .env.example).

Usage

The command below will run the pipeline on the papers directory, which contains a few PDFs. It will then start a REPL where you can ask questions about the PDFs. You can exit the Q&A loop by typing "exit" or cmd/ctrl + c.

$ poetry run python main.py --pdf_directory=papers

Note that the LanceDB database is included in this repo, so the creation and storage of embeddings within the ingestion code will be skipped unless you delete the .lancedb directory. This is included to make it easier to run the code without having to wait for the embeddings to be generated.

Example

/static/example.png