Speech-to-text

Speech-to-text transcription of a local audio file using IBM Watson Speech-to-Text. So far only synchronous transcription requests are implemented.

Install Python
Create an IBM Cloud account and a free speech-to-text service
Copy the newly generated API key into the config.env file
Choose the correct service endpoint of your instance and add this to the config.env file

Clone the github repository and change into the local repo

git clone https://github.com/RapTho/speech-to-text.git
cd speech-to-text

Install the python dependencies

pip3 install -r requirements.txt

Then start the transcription, wait until it's done and check the stdout or transcript.txt

python3 src/main.py

You may replace the audio file in src/input and if necessary also the AUDIO_FORMAT in the config.env file.

Change the language by modifying the LANGUAGE key in the config.env. Choose one of the existing language models

Change the BACKGROUND_SUPPRESSION key in the config.env. Choose a value between 0.0 and 1 where 0 means no suppression and 1 is the maximum.

Build the container. All commands I run using podman work with docker also.

podman build -t speech-to-text:latest .

podman run -v ${PWD}/src/output/:/app/src/output/ speech-to-text:latest