This code utilizes Open AI GPT, Langchain, Redis Cache, OpenSearch and Unstructured to fetch content from URLs, sitemap, PDF, Powerpoint, Notion doc and images to create embeddings/vectors and save them in a local OpenSearch database. The created collections can then be used with GPT to answer questions.
You need Node.js 18+ to run this code. You can download it here: https://nodejs.org/en/download/
First, copy the .env
file and set up required environment variables.
cp .env.example .env
To create the containers, install the required dependencies and launch the server, run:
yarn build
This should create the following containers: ✔ Container redis (cache) ✔ Container unstructured (handle images, ppt, text, markdown) ✔ Container opensearch (search engine) ✔ Container opensearch-dashboards (search engine dashboard)
OpenSearch dashboard can be accessed at http://localhost:5601
Install dependencies and start the server (app.js)
The server will listen on the port specified in the .env
file (default is 3000).
GET /api/health
200 OK
on success
{
"success": true,
"message": "Server is healthy"
}
GET /api/clearcache
200 OK
on success
{
"success": true,
"message": "Cache cleared"
}
POST /api/add
{
"url": "https://www.example.com/sitemap.xml", //* url of the sitemap
"collection": "collection_name", //* name of the collection to populate
"filter": "filter", // default to null - use to filter URL with this string (ex. "/blog/")
"limit": 10, // default to null
"chunkSize": 2000, // default to 2000
"chunkOverlap": 250, // default to 250
"sleep": 0 // For sitemap, time to wait between each URLs
}
200 OK
on success
{
"response": "added",
"collection": "collection_name"
}
DELETE /api/collection
{
"collection": "collection_name", //* name of the collection to delete
}
200 OK
on success
{
"success": true,
"message": "{collection_name} has been deleted"
}
POST /api/live
{
"url": "https://www.example.com", //* url of the webpage
"question": "Your question", //* the question to ask
"temperature": 0 // default to 0
}
200 OK
on success
{
"response": "response_text"
}
POST /api/question
{
"question": "your question", //* the question to ask
"collection": "collection_name", //* name of the collection to search
"model": "model_name", // default to gpt-3.5-turbo
"k": 3, // default to 3 (max number of results to use)
"temperature": 0, // default to 0
"max_tokens": 400 // default to 400
}
200 OK
on success
{
"response": "response_text",
"sources": ["source1", "source2"]
}
To allow access to the app externally using the port set in the .env
file, you can use ngrok. Follow the steps below:
- Install ngrok: https://ngrok.com/download
- Run
ngrok http <port>
in your terminal (replace<port>
with the port set in your.env
file) - Copy the ngrok URL generated by the command and use it in your Voiceflow Assistant API step.
This can be handy if you want to quickly test this in an API step within your Voiceflow Assistant.