Chat with your documentation usecase

This repository explores a very basic pipeline on how to chat with your own documentation with the example of refinery.

Watch the full tutorial on YouTube (under 20min):

Setup

The example in refinery is already set up, but let me briefly explain the steps for other projects that follow this approach.

Data and application

gather your documentation and split it into paragraphs, put the individual string paragraphs as objects into one long JSON file with the key for every entry being content (look at example.json)
create a new project in refinery, upload that JSON file
select an embedding, keep in mind that we're dealing with asymmetric semantic search (I used content-classification-sentence-transformers/msmarco-distilbert-base-v4)
create a new read/write access token in the admin area in refinery, save that as an environment variable in the .env file under GATES_KEY
activate the endpoint in gates and select the similarity search option
lastly, change the KERN_PROJECT_ID in server.py to your Kern project ID

OpenAI keys

save your organization-ID under the environment variable OPENAI_ORG_ID in .env
save your OpenAI API key under the environment variable OPENAI_API_KEY in .env

Python environment

create new environment (e.g. conda create --name docs_chatbot python=3.10)
activate it (e.g. conda activate docs_chatbot)
install dependencies pip install -r requirements.txt

Running the repository