AI | Knowledge Base Project | Voiceflow

Open AI GPT, Langchain, OpenSearch and Unstructured

This code utilizes Open AI GPT, Langchain, Redis Cache, OpenSearch and Unstructured to fetch content from URLs, sitemap, PDF, Powerpoint, Notion doc and images to create embeddings/vectors and save them in a local OpenSearch database. The created collections can then be used with GPT to answer questions.

Quickstart Video

Watch the video

Node.js

You need Node.js 18+ to run this code. You can download it here: https://nodejs.org/en/download/

Table of Contents

  1. Getting Started
  2. API Documentation
  3. Dependencies

Getting Started

Installation

First, copy the .env file and set up required environment variables.

cp .env.example .env

To create the containers, install the required dependencies and launch the server, run:

yarn build

This should create the following containers: ✔ Container redis (cache) ✔ Container unstructured (handle images, ppt, text, markdown) ✔ Container opensearch (search engine) ✔ Container opensearch-dashboards (search engine dashboard)

OpenSearch dashboard can be accessed at http://localhost:5601

Install dependencies and start the server (app.js)

The server will listen on the port specified in the .env file (default is 3000).

API Documentation

Check server health

GET /api/health

Response

  • 200 OK on success
{
  "success": true,
  "message": "Server is healthy"
}

Clear Redis cache

GET /api/clearcache

Response

  • 200 OK on success
{
  "success": true,
  "message": "Cache cleared"
}

Add content to OpenSearch

POST /api/add

Request

{
  "url": "https://www.example.com/sitemap.xml", //* url of the sitemap
  "collection": "collection_name", //* name of the collection to populate
  "filter": "filter", // default to null - use to filter URL with this string (ex. "/blog/")
  "limit": 10, // default to null
  "chunkSize": 2000, // default to 2000
  "chunkOverlap": 250, // default to 250
  "sleep": 0 // For sitemap, time to wait between each URLs
}

Response

  • 200 OK on success
{
  "response": "added",
  "collection": "collection_name"
}

Delete a collection

DELETE /api/collection

Request

{
  "collection": "collection_name", //* name of the collection to delete
}

Response

  • 200 OK on success
{
  "success": true,
  "message": "{collection_name} has been deleted"
}

Get a response using live webpage as context

POST /api/live

Request

{
  "url": "https://www.example.com", //* url of the webpage
  "question": "Your question", //* the question to ask
  "temperature": 0 // default to 0
}

Response

  • 200 OK on success
{
  "response": "response_text"
}

Get a response using the vector store

POST /api/question

Request

{
  "question": "your question", //* the question to ask
  "collection": "collection_name", //* name of the collection to search
  "model": "model_name", // default to gpt-3.5-turbo
  "k": 3, // default to 3 (max number of results to use)
  "temperature": 0, // default to 0
  "max_tokens": 400 // default to 400
}

Response

  • 200 OK on success
{
  "response": "response_text",
  "sources": ["source1", "source2"]
}

Using ngrok

To allow access to the app externally using the port set in the .env file, you can use ngrok. Follow the steps below:

  1. Install ngrok: https://ngrok.com/download
  2. Run ngrok http <port> in your terminal (replace <port> with the port set in your .env file)
  3. Copy the ngrok URL generated by the command and use it in your Voiceflow Assistant API step.

This can be handy if you want to quickly test this in an API step within your Voiceflow Assistant.