You can use this code as a base for doing real time transcription of a phone call using Google Speech to Text API.
An audio stream is sent via websocket connection to your server and then relayed to the Google streaming interface. Speech recognition is performed and the text returned to the console.
You will need to set up a Google Cloud project and service account. Once these steps are completed, you will have a downloaded JSON file to set up the rest of the project. You will need this file prior to using the Deploy to Heroku
button. If you plan on running this locally, make sure this file is saved in the project folder.
In order to run this on Heroku, you will need to gather the following information:
API_KEY
- This is the API key from your Nexmo Account.API_SECRET
- This is the API secret from your Nexmo Account.GOOGLE_CLIENT_EMAIL
- You can find this in thegoogle_creds.json
file asclient_email
GOOGLE_PRIVATE_KEY
- You can find this in thegoogle_creds.json
file asprivate_key
.- Be sure to select everything as
-----BEGIN PRIVATE KEY-----\nXXXXXXXXX\n-----END PRIVATE KEY-----\n
This will create a new Nexmo application and phone number to begin testing with. View the logs to see the transcription response from the service. You can do this in the Heroku dashboard, or with the Heroku CLI using heroku logs -t
.
You will need to create a new Nexmo application in order to work with this app:
Install the CLI by following these instructions. Then create a new Nexmo application that also sets up your answer_url
and event_url
for the app running locally on your machine.
nexmo app:create google-speech-to-text http://<your_hostname>/ncco http://<your_hostname>/event
This will return an application ID. Make a note of it.
If you don't have a number already in place, you will need to buy one. This can also be achieved using the CLI by running this command:
nexmo number:buy
Finally, link your new number to the application you created by running:
nexmo link:app YOUR_NUMBER YOUR_APPLICATION_ID
To run this on your machine you'll need an up-to-date version of Node.
Start by installing the dependencies with:
npm install
Then copy the example.env file to a new file called .env:
cp .env.example > .env
Edit the .env file to add in your application ID and the location of the credentials file from Google.
GOOGLE_APPLICATION_CREDENTIALS=./google_creds.json
APP_ID="12345678-aaaa-bbbb-4321-1234567890ab"
LANG_CODE="en-US"
Tools like ngrok are great for exposing ports on your local machine to the internet. If you haven't done this before, check out this guide.
If you aren't going to be working in the en-US language then you can change the language to any of the other supported languages listed in the Google Speech to Text API documentation.
To run the app using Docker run the following command in your terminal:
docker-compose up
This will create a new image with all the dependencies and run it at http://localhost:3000.