/bupa

Bupa is a companion bot built using OpenAI and TTS open source tooling

Primary LanguageJupyter Notebook

openai-tts-server

This repo contains the code for an example of a text to speech server that uses the OpenAI API to fetch customised answers to questions based on a given context.

Showcase

Bupa Chatbot Demo (31 May 2023)

Note: Please note that there are some "silent" actions within the video to simulate a scenario where a user does not speak with the bot after a certain amount of time. Please do not skip those since you will see a custom message to catch your attention after some period of time.

Bupa.-.Showcase.31.May.2023.mp4

Other videos with earlier versions at assets/

Prerequisites

  • Python 3.9

Installation

  1. Clone the repo
  2. Install the dependencies with pip install -r requirements.txt
  3. Create a tts-server/.env file with the following variables:
# options - openai or local
OPENAI_API_KEY=<your-openai-api-key>
OPENAI_MODEL=gpt-3.5-turbo

# options - vits-emo, tortoise or default
TTS_MODE=vits-emo
ROBOT_FILTER=true

COQUI_AI_BASE_URL=https://app.coqui.ai/api/v2/samples
COQUI_AI_API_KEY=<your-coqui-ai-api-key>
COQUI_AI_VOICE_ID=d2bd7ccb-1b65-4005-9578-32c4e02d8ddf

CONVERSATION_HISTORY=true
  1. To use the local gpt4all model, you first have to download it and place it under tts-server/assets/bin;
  2. To use the model (vits-emo) that was trained for the purpose of this, please contact me so that I can provide you the URLs.

Usage

Using the CLI

  1. Run the server with python tts-server/main.py
  2. Access the server running at http://localhost:8080/, configure the Bopa bot and submit a question
  3. Or, as an alternative, send a POST request to http://localhost:8080/ask with the following JSON body:
{
    "mood": "happy",
    "persona": "yoda",
    "text": "What is human life expectancy in the United States?"
}
  1. Also, to get the speech representation of a text you can send a POST request to http://localhost:5001/audio with the following JSON body:
{
    "mood": "happy",
    "persona": "yoda",
    "text": "The human expectancy in the United States fortunately is 78 years old."
}

Using Docker

  1. Build the Docker image with docker build -t bupa-bot .
  2. Run the Docker container with docker run -p 5001:8080 bupa-bot
  3. Access the server running at http://localhost:5001/, configure the Bopa bot and submit a question
  4. Or, as an alternative, to get a response you can send a POST request to http://localhost:5001/ask with the following JSON body:
{
    "mood": "happy",
    "persona": "yoda",
    "text": "What is human life expectancy in the United States?"
}
  1. Also, to get the speech representation of a text you can send a POST request to http://localhost:5001/audio with the following JSON body:
{
    "mood": "happy",
    "persona": "yoda",
    "text": "The human expectancy in the United States fortunately is 78 years old."
}

Text to Speech Training

A different set of models were created to generate speech with emotion. In the end, we found that the best results were achieved by fine tuning an existing VITS model and adding a multi speaker functionality where each of the speakers is an emotion.

The notebook used to train this model is available under notebooks/ such as the other models that were tested.

There are the results for our TTS model after 1.017.756 steps:

Sentence Neutral Angry Sad Happy Surprised
"I am a crazy scientist." 0_mymodel_vits_output_1017756_neutral.webm 0_mymodel_vits_output_1017756_angry.webm 0_mymodel_vits_output_1017756_sad.webm 0_mymodel_vits_output_1017756_happy.webm 0_mymodel_vits_output_1017756_surprise.webm
"The cake is a lie." 1_mymodel_vits_output_1017756_neutral.webm 1_mymodel_vits_output_1017756_angry.webm 1_mymodel_vits_output_1017756_sad.webm 1_mymodel_vits_output_1017756_happy.webm 1_mymodel_vits_output_1017756_surprise.webm
"Do you want to go to the supermarket with me?" 2_mymodel_vits_output_1017756_neutral.webm 2_mymodel_vits_output_1017756_angry.webm 2_mymodel_vits_output_1017756_sad.webm 2_mymodel_vits_output_1017756_happy.webm 2_mymodel_vits_output_1017756_surprise.webm
"I am feeling great today!" 3_mymodel_vits_output_1017756_neutral.webm 3_mymodel_vits_output_1017756_angry.webm 3_mymodel_vits_output_1017756_sad.webm 3_mymodel_vits_output_1017756_happy.webm 3_mymodel_vits_output_1017756_surprise.webm

Robot Filters

The filters were designed by a post production sound designer and applied using a set of Python libraries (kudos to Spotify Pedalboard librart).

Next steps

  • On the existing architecture, create a robot filter to apply to the final audio
  • Create or adapt datasets with emotion for training the TTS models
  • Apply the robot filter to the emotion dataset
  • Train different models for different moods and personas (notebooks already available to train new models using GlowTTS and VITS)
  • Add more moods and personas
  • Use our own GPT model instead of the OpenAI API

Acknowledgements

This project was inspired by the following projects:

  • OpenAI API
  • Coquis-TTS
  • Spotify Pedalboard