You can use this code as a base for doing real time transcription of a phone call using Azure Speech Services.
An audio stream is sent over a websocket to your server and you then relay that on to the Azure websocket interface where recognition is performed and the phrases returned to the console.
You'll need to signup for Azure Speech Services and make a note of two pieces of information - the first service API key, and the regional location of the Speech API service you deployed (eg. westeurope).
To run the app using Docker run the following command in your terminal:
docker-compose up
This will create a new image with all the dependencies and run it at http://localhost:8000
.
You can declare the required environment variables by editing the docker-compose.yml
file.
To run this on your machine you'll need an up-to-date version of Python 3.
Start by installing the dependencies with:
pip install --upgrade -r requirements.txt
Then copy the .env.example
file to a new file called .env
:
cp .env.example > .env
Edit the .env
file to add in your own service credentials from Azure and other settings specific to your instance of the Azure Speech Service API.
HOSTNAME = "yourhostname.ngrok.io"
LANGUAGE = "en-GB"
KEY1 = "3234gh3gh34ghj32hj"
REGIONAL_API_ENDPOINT = "westeurope" # eg. "westeurope", "southeastasia", "uswest"
By default the server runs on port 8000.
Tools like ngrok are great for exposing ports on your local machine to the internet. If you haven't done this before, check out this guide.
If you aren't going to be working in the en-GB
language then you can change the language to any of the other supported languages listed in the Speech Service API documentation.
The Azure Speech Service API can run across multiple regions. When you initially set it up you will specify which region your service will run in, you will then need to change the REGIONAL_API_ENDPOINT
environment variable to match.
If you are working with a local install you can run the server using this command:
python ./server.py
You will need to create a new Nexmo application in order to work with this app:
Install the CLI by following these instructions. Then create a new Nexmo application that also sets up your answer_url
and event_url
for the app running locally on your machine.
nexmo app:create ms-speech-to-text http://<your_hostname>/ncco http://<your_hostname>/event
This will return an application ID. Make a note of it.
If you don't have a number already in place, you will need to rent one. This can also be achieved using the CLI by running this command:
nexmo number:buy
Finally, link your new number to the application you created by running:
nexmo link:app YOUR_NUMBER YOUR_APPLICATION_ID
With your app running, call the number you assigned to it and start speaking. After a brief pause you will see whatever you say written out to the console, in real time.
This example code simply prints the reponses from Azure to the console, however to integrate it with your own application you should extend the on_return_message
function in server.py