PDF-GPT

Query PDFs using GPT - Open AI + Pinecone (vector store) + LlamaIndex

Usage

Easiest way would be to run the app locally by cloning the repo:

 git clone https://github.com/pranav-kural/pdfgpt.git

Add a .env file with your Open AI and Pinecone API keys (or add these values to .env_sample and rename it to .env)

OPENAI_API_KEY={Open AI API Key}
PINECONE_KEY={Pinecone API Key}

Update Pinecone information (index name and region) and any other parameters in the params.py file.

Then, run the server using:

uvicorn main:app --reload

Endpoints

Using below endpoint, provide URL to the PDF file you want the API to create a vector store for. This vector store will then be stored in Pinecone and will be queried when queries are made to the chatbot.

/create?document_url={document_url}

If you've already created an index, and want to load it from Pinecone, use the load endpoint

/load

To make a query and get response:

/query?q={query text}

How it works?

Below is a brief overview of what happens when user makes a query:

The query embeddings are generated
Vector store is searched with the query embeddings to extract closest neighbors
Closest neighbor embedding values are used to retrieve most relevant portion of content that relate to given query
Chat completion request is made to Chat model (ex, OpenAI GPT 3.5 turbo) by providing it the content related to the query as context and the query itself
Response returned

To learn more on how the API works on underneath check my blog post on PDF-GPT: LlamaIndex + Open AI + Pinecone + FastAPI

Screenshots

Example screenshot for a query based on the sample document:

To-be-implemented

Add authentication, API Key and user management to the API
Add authentication for endpoints (including verification of API key through session id)
Ability for users to create, save, delete and interact with multiple chatbots