Transcribe a phone call in real-time
Do your customers or users use the telephone? What sort of discussions do you have on the phone?
Would their phone calls be more efficient if you could transcribe what is said on the call? What if you could analyze the transcription using natural language understanding? And what if you could do all of this in real-time, while the call is still on-going?
Analysis of unstructured data, such as the audio from phone calls, can bring many benefits: such as providing guidance to people on the phone to help make their call more effective, prioritising users who would benefit from additional support, or identifying other automated actions that can be taken in response to phone call discussions. Specific responses vary depending on use cases, but all such solutions have one thing in common: the need to transcribe and analyze transcriptions in real-time.
This code pattern shows developers how to stream phone call audio through IBM Watson Speech to Text and IBM Watson Natural Language Understanding services.
When you have completed this code pattern, you will understand how to:
- use IBM Watson Speech to Text to transcribe audio in real-time
- use IBM Watson Natural Language Understanding to perform analysis on transcriptions in real-time
Demo
video recording of the code pattern in action : youtu.be/So3b4uJGaBw
Flow
- One person makes a phone call to a phone number managed by Twilio (more details)
- Twilio routes the phone call to the receiver, who answers the call (more details)
The caller and receiver start talking to each other. While they are doing this...
- Twilio streams a copy of the audio from the phone call to your application (more details)
- Your application sends audio to the Speech to Text service for transcribing (more details)
- Speech to Text asynchronously sends transcriptions to the app when they are available (more details)
- The app submits the transcription text to Natural Language Understanding for analysis (more details)
- The transcriptions and analyses can be monitored from a web page
Steps
Set the following environment variables in your shell for use in the commands below
CODE_ENGINE_PROJECT_NAME=my-phone-stt-demo-project
CODE_ENGINE_APP_NAME=phone-stt-demo
IBM_CLOUD_REGION=eu-gb
- Log into IBM Cloud
- Create a project in IBM Code Engine
- Create API keys for Watson services used in this project
- Clone the source code
- Build the application image
- Push the application image to a container registry
- Create a pull secret for the image
- Deploy the application
- Configure Twilio to use your application
- Make a phone call
1. Log into IBM Cloud
Log in to the desired account with the IBM Cloud CLI using ibmcloud login
2. Create IBM Code Engine project
The quickest way to get up and running is to use the IBM Cloud CLI with the Code Engine plugin.
If you do not have a Code Engine project, create one
ibmcloud ce project create --name $CODE_ENGINE_PROJECT_NAME
If you already have a Code Engine project, target the project
ibmcloud ce project target --name $CODE_ENGINE_PROJECT_NAME
3. Create IBM Watson credentials
3.1 Speech to Text
Create an instance of the IBM Watson Speech to Text service
ibmcloud resource service-instance-create \
phone-stt-demo-speech-to-text \
speech-to-text \
lite \
$IBM_CLOUD_REGION
This creates a free instance of the service, which should be sufficient for trying this project. See the catalog page for more details about the limitations.
Create an API key for your Speech to Text instance
ibmcloud resource service-key-create \
code-engine-stt-credentials \
Manager \
--instance-name phone-stt-demo-speech-to-text
Extract the API key and instance URL into environment variables
STT_API_KEY=$(ibmcloud resource service-key code-engine-stt-credentials --output json | jq -r ".[0].credentials.apikey")
STT_INSTANCE_URL=$(ibmcloud resource service-key code-engine-stt-credentials --output json | jq -r ".[0].credentials.url")
echo "Speech to Text : API key : $STT_API_KEY"
echo "Speech to Text : URL : $STT_INSTANCE_URL"
This uses jq to extract the API key and URL. If you don't want to use jq
, you can simply run ibmcloud resource service-key code-engine-stt-credentials --output json
and create environment variables with the API key and URL from the credentials that are output.
Create a Secret with the API key and instance URL
ibmcloud ce secret create \
--name phone-demo-apikey-stt \
--from-literal STT_API_KEY=$STT_API_KEY \
--from-literal STT_INSTANCE_URL=$STT_INSTANCE_URL
3.2 Natural Language Understanding
Create an instance of the IBM Watson Natural Language Understanding service
ibmcloud resource service-instance-create \
phone-stt-demo-natural-language-understanding \
natural-language-understanding \
free \
$IBM_CLOUD_REGION
This creates a free instance of the service, which should be sufficient for trying this project. See the catalog page for more details about the limitations.
Create an API key for your Natural Language Understanding instance
ibmcloud resource service-key-create \
code-engine-nlu-credentials \
Manager \
--instance-name phone-stt-demo-natural-language-understanding
Extract the API key and instance URL into environment variables
NLU_API_KEY=$(ibmcloud resource service-key code-engine-nlu-credentials --output json | jq -r ".[0].credentials.apikey")
NLU_INSTANCE_URL=$(ibmcloud resource service-key code-engine-nlu-credentials --output json | jq -r ".[0].credentials.url")
echo "Natural Language Understanding : API key : $NLU_API_KEY"
echo "Natural Language Understanding : URL : $NLU_INSTANCE_URL"
This uses jq to extract the API key and URL. If you don't want to use jq
, you can simply run ibmcloud resource service-key code-engine-stt-credentials --output json
and create environment variables with the API key and URL from the credentials that are output.
Create a Secret with the API key and instance URL
ibmcloud ce secret create \
--name phone-demo-apikey-nlu \
--from-literal NLU_API_KEY=$NLU_API_KEY \
--from-literal NLU_INSTANCE_URL=$NLU_INSTANCE_URL
4. Clone the repo
Clone the phone-stt-demo
repo locally. In a terminal, run:
git clone https://github.com/IBM/phone-stt-demo
cd phone-stt-demo
5. Build the application image
docker build -t phone-stt-demo:latest .
6. Push the application image to a container registry
You can push the application image to any container registry that you like.
The instructions in this step explain how to use the IBM Cloud Container Registry, using the IBM Cloud CLI with the Container Registry plugin.
Set the following environment variables in your shell for use in the commands below
CONTAINER_REGISTRY_REGION=uk.icr.io
Specify the region to use
ibmcloud cr region-set $CONTAINER_REGISTRY_REGION
Log into the container registry
ibmcloud cr login --client docker
Create a namespace to store your application image
ibmcloud cr namespace-add $CODE_ENGINE_PROJECT_NAME
Push the image
IMAGE_LOCATION=$CONTAINER_REGISTRY_REGION/$CODE_ENGINE_PROJECT_NAME/phone-stt-demo:latest
docker tag phone-stt-demo:latest $IMAGE_LOCATION
docker push $IMAGE_LOCATION
7. Create a pull secret for the application image
The way to do this depends on the container registry that you are using.
The instructions in this step assume that you are using the IBM Cloud Container Registry.
Create an API key with permission to pull your images from the container registry.
ibmcloud iam service-id-create \
phone-stt-demo-pull-secret \
-d "Pull secret used by Code Engine for the phone-stt-demo Docker image"
ibmcloud iam service-policy-create \
phone-stt-demo-pull-secret \
--service-name container-registry \
--roles Reader
ibmcloud iam service-api-key-create \
phone-stt-demo-pull-secret-key \
phone-stt-demo-pull-secret \
--description "API key for the phone-stt-demo-pull-secret service ID used by Code Engine"
This will create an API key which will be displayed only once. You should make a copy of it as it cannot be retrieved after it has been created.
Set an environment variable with the API key
IMAGE_REGISTRY_APIKEY=<your-pull-secret-api-key>
Create a Secret with the image registry pull secret
ibmcloud ce registry \
create \
--name phone-stt-demo-registry \
--server $CONTAINER_REGISTRY_REGION \
--username iamapikey \
--password $IMAGE_REGISTRY_APIKEY
8. Deploy the application
The location of your Docker image will depend on the container registry used in Step 6.
If you used the IBM Container Registry with the instructions above, the location will be: $CONTAINER_REGISTRY_REGION/$CODE_ENGINE_PROJECT_NAME/phone-stt-demo:latest
If you used a different image registry, replace the location in the command below with the location of your image.
ibmcloud ce application create \
--name $CODE_ENGINE_APP_NAME \
--image $IMAGE_LOCATION \
--registry-secret phone-stt-demo-registry \
--cpu 0.125 --memory 0.25G --ephemeral-storage 400M \
--port 8080 \
--maxscale 1 \
--env-from-secret phone-demo-apikey-stt \
--env-from-secret phone-demo-apikey-nlu
9. Point a Twilio phone number at your deployed application
Get the URL for your application
ibmcloud ce application get \
--name $CODE_ENGINE_APP_NAME \
--output url
This will give you a URL like https://phone-stt-demo.abcdefg1a2b.eu-gb.codeengine.appdomain.cloud
.
Use this as the REPLACE-THIS-URL
value in following the Twilio setup instructions.
10. Try it out!
Open the URL from step 9 in a web browser.
Make a phone call to the Twilio phone number.
When prompted, enter the phone number that you want to call, including the international dialling code for the country the phone number is in.
(For example, to call the UK phone number 02079463287, you should enter 442079463287
when prompted.)
License
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.