retell-custom-llm-python-demo

(This is a sample demo repo to show how to have your own LLM plugged into Retell.)

This repo currently uses OpenAI endpoint. Feel free to contribute to make this demo more realistic.

This repos is intended to be used together with Simli Avatar API - found here Link.

It also uses unstructured.io to process unstructered data and feed it into Datastax astradb vector database, which in turn is what is used to ensure that the interactions with historical characters are grounded in truth and historical context.

We have a vector database that is public: https://3cb6dbc5-f10f-43ba-9f2d-7af047ef7523-us-east1.apps.astra.datastax.com that you can connect to by setting the ASTRA_ENDPOINT environment variable to that and ASTRA_TOKEN to our read-only token: AstraCS:SxlTXLOHmGZawqimghkMDaeK:5c6cbc041fb72579587e5d933982704e728b9535148a250f9ce20c7518442d09

If you want to create your own, follow the steps below:

Create Your Own Vector Database

  1. Generate structured .json data by running the script with a .pdf argument, for example: One example of a historical source we used in this project is this book about julius Caesar
UNSTRUCTURED_API_KEY=<YOUR_API_KEY> python3 unstructured/main.py <PDF FILE>

You can run this multiple times if you want to structure multiple files.

  1. Create a Datastax account and create an AstraDB in the dashboard.

  2. Under "Integrations" on the left hand menu, enable OpenAI as embedding provider

  3. Create a collection in your AstraDB called "caesar", for example.

  4. Choose OpenAI as the embedding generation method

  5. Upload the json files that you created in Step 1.

Steps to run custom LLM Websocket Server

  1. First install dependencies
pip3 install -r requirements.txt
  1. Fill out the API keys in .env

  2. In another bash, use ngrok to expose this port to public network

ngrok http 8080
  1. Start the websocket server
uvicorn app.server:app --reload --port=8080

You should see a fowarding address like https://dc14-2601-645-c57f-8670-9986-5662-2c9a-adbd.ngrok-free.app, and you are going to take the hostname dc14-2601-645-c57f-8670-9986-5662-2c9a-adbd.ngrok-free.app, prepend it with wss://, postpend with /llm-websocket (the route setup to handle LLM websocket connection in the code) to create the url to use in the dashboard to create a new agent. Now the agent you created should connect with your localhost.

The custom LLM URL would look like wss://dc14-2601-645-c57f-8670-9986-5662-2c9a-adbd.ngrok-free.app/llm-websocket

  1. Run Rag database queryer api:

cd queryer

python main.py

Note that the queryer code requires two environment variables set: ASTRA_TOKEN and ASTRA_ENDPOINT. You can set custom ones from your AstraDB dashboard or use ours above if you are using our public vector database.

This will start a fastapi that interfaces with datastax astradb, which is where we get the rag data to enrich the agent interactions.

Run in prod

To run in prod, you probably want to customize your LLM solution, host the code in a cloud, and use that IP to create agent.