Context-based OpenAI GPT-3 Chatbot

This is a demo app that uses OpenAI's GPT-3 to answer questions using context specific documents.

The context for this chatbot is derived from the Geodata-Harvester documentation and the webapp is embedded in the same page (see Section "What is it").

How to use

Download or fork the repository.
Create an OpenAI account and get an API key.
Create the file openai_api_key.txt and add the OpenAI API key to the file.
Install the dependencies using pip install -r requirements.txt.
Run chatapp.py to start the app:
```
python chatapp.py
```

How it works

This app is based on the documents from the Geodata-Harvester project and uses the OpenAI's GPT-3 to answer questions regarding the context of the document. The context and vector embeddings have been pre-processed and are stored in the folder embeddings as CSV files.

The app uses the following main steps to answer a question:

Find the most relevant document sections for a given question by using the embeddings of the document sections.
Construct a prompt for the question using the most relevant document sections (based on cosine similarity).
Add guidelines to the prompt to ensure that the answer is relevant to the context of the document.
Use the prompt to answer the question using OpenAI's GPT-3.
Return the answer and the number of tokens used to answer the question.

The total number of tokens used to answer the question is estimated by counting the sum of number of tokens used for the question embedding, the prompt and the response.

Requirements

The following main Python dependencies are required:

openai
pandas
numpy
tiktoken
matplotlib
plotly
scikit-learn
scipy

For a full list of dependencies, please see the file requirements.txt. The app has been tested with Python 3.9 and 3.10.

Known Limitations

The app is limited to provide only answers that are related to the context of the documents.
Some of the training data is not updated and might not be relevant anymore.
The app is currently limited to 1000 tokens per request.
Embeddings and matching content need to be provided.
The app is not optimized for speed and might be slow for large datasets.
The app does not provide any feedback on the quality of the answers.

References

License

This open-source project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE v2.1. See the LICENSE file for details.