With Twilio Media Streams, you can now extend the capabilities of your Twilio-powered voice application with real time access to the raw audio stream of phone calls. For example, we can build tools that transcribe the speech from a phone call live into a browser window, run sentiment analysis of the speech on a phone call or even use voice biometrics to identify individuals.
If you prefer a step by step guide through building this yourself, this blog post will guide you through transcribing speech from a phone call into text, live in the browser using Twilio and Google Speech-to-Text using Node.js.
Before we can get started, you’ll need to make sure to have:
- A Free Twilio Account
- A Google Cloud Account
- Installed ngrok
- Installed the Twilio CLI
-
Setup Google Project and retrieve service account key
a. Install and initialize the Cloud SDK
b. Setup a new GCP Project
c. Enable the Google Speech-To-Text API for that project
d. Create a service account.
e. Download a private key as JSON.
-
Modify the
.env.sample
file to include the path to your JSON service account key and save it as a.env
file -
Run the following commands:
Buy a Phone Number (I have used the
GB
country code to buy a mobile number, but feel free to change this for a number local to you.)$ twilio phone-numbers:buy:mobile --country-code GB
Start ngrok:
$ ngrok http 8080
While this is running in a new window copy the forwarding HTTPS URL (https://xxxxx.ngrok.io) and set your Twilio number to this URL:
$ twilio phone-numbers:update TWILIO_NUMBER --voice-url https://xxxxxxxx.ngrok.io
Install dependencies and start your server:
$ npm install
$ npm start