VocalTales: Children's Storytelling Audio Chatbot

See the Medium Article for context and more information

NOTE: This does NOT work on mobile iOS devices. See Gradio Issue #2987 for details.

This is a Gradio UI application that takes in a request for a story from the microphone and speaks an interactive Choose-Your-Own-Adventure style children's story. It leverages:

OpenAI Whisper: to transcribe user audio input request
OpenAI ChatGPT (3.5-turbo): to generate a story chapter given the user's inputs
(Optional) Google Cloud Text-to-Speech: to use realistic voices when telling the story.

Pricing

WARNING: This application uses paid API services. Create quotas and watch your usage.

At the time of writing, the pricing is as follows:

whisper: $0.006 / minute (rounded to the nearest second)
gpt-3.5-turbo: $0.002 / 1K tokens
Google Text-to-Speech:
- 0 to 1 million bytes free per month
- $0.000016 USD per byte ($16.00 USD per 1 million bytes)

Check the links as these can change often. But at the time of writing it costs less than one USD for light use.

Both OpenAI and Google offer free credits for new users.

Setup

Note there are two ways to speak the story: Mac or GCP Text-to-Speech. If using a Mac, the Mac say command is used and that's the easiest/fastest route to running this. It uses the System voice set up in the Accessibility settings. However, if not on a Mac or if you prefer a more realistic voice, the GCP Text-to-Speech may be used. This requires you having (a) a GCP project, (b) the TTS API enabled, and (c) your account authenticated in gcloud (or GOOGLE_APPLICATION_CREDENTIALS environment variable set).

This application has only been tested on a Macbook.

Sign up at OpenAI and acquire an OpenAI API key.
Add to environment variable with: export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxx"
Create virtual environment
Run pip install -r requirements.txt
If on Mac, brew install ffmpeg: brew install ffmpeg

Linux may need to install also but untested.

Review and update config in config.py as desired
If using GCP TTS
set in config.py: SPEECH_METHOD = SpeechMethod.GCP
Navigate to the Google API page and enable the API
Confirm you are authenticated in gcloud and your account has access to that API.
Run with: python storyteller.py
Navigate to http://127.0.0.1:7860/ and have fun!

Running as Docker Container

Replace <service-name> with a name of your choice.

Build Docker image: docker build -t <image-name> .
Run locally with something similar to:

docker run -it --rm \
    -e GOOGLE_APPLICATION_CREDENTIALS=/tmp/creds.json \
    -v ${HOME}/.config/gcloud/application_default_credentials.json:/tmp/creds.json \
    -e OPENAI_API_KEY=<openai-api-key> \
    -p <port>:7860 \
    audio-storyteller \
    python storyteller.py \
    --address=0.0.0.0 \
    --port=7860 \
    --username=<username> \
    --password=<password>

Fill in: <openai-api-key>, <port>, and optional <username>:<password>. Then once running, navigate on a browser to 127.0.0.1:` and fill in the optional username:password you provided.

Deploying to Google Cloud Run

Follow the directions above to create a local docker image.

Tag and push (Note: Follow these directions to authenticate)

docker tag <image-name> gcr.io/<project-id>/<image-name>
docker push gcr.io/<project-id>/<image-name>

Create a service account on your GCP project IAM page named: audio-storytelling-bot@<project-id>.iam.gserviceaccount.com

Deploy with the following command, setting anything in <> appropriately:

gcloud run deploy audio-storytelling-bot \
    --image gcr.io/<project-id>/<image-name> \
    --platform managed \
    --service-account=audio-storytelling-bot@<project-id>.iam.gserviceaccount.com \
    --set-env-vars=OPENAI_API_KEY=<openai-key-string> \
    --allow-unauthenticated \
    --port=7860 \
    --cpu=1 \
    --memory=512Mi \
    --min-instances=0 \
    --max-instances=3 \
    --command="python" \
    --args="storyteller.py,--address=0.0.0.0,--port=7860,--username=user,--password=storyteller"

Cloud Run will automatically scale the number of instances based on the incoming traffic. You can access the deployed Gradio application via the URL provided by the Cloud Run service.

tszumowski/vocaltales_storyteller_chatbot

VocalTales: Children's Storytelling Audio Chatbot

Pricing

Setup

Running as Docker Container

Deploying to Google Cloud Run