
Speech to Text command using IBM Watson API. Get a transcript from an audio file.

Primary LanguagePython

Speech to Text Python command

Simple command line tool to create text transcripts out of audio files using IBM Watson Speech to Text.


Using PyPi is the easiest way:

$ pip install speech-to-text

Or installing the dev version:

$ git clone https://github.com/rmotr/speech-to-text
$ mkvirtualenv speech-to-text
$ pip install -r requirements.txt


The first thing you'll need to do is get your Bluemix Username and Password. This is a tedious process, if you have issues, we've written a blog post that describes how to do it. Once you have your username and password you can do:

$ speech_to_text -u <MY-USERNAME> -p <MY-PASSWORD> -f html -i <AUDIO-FILE> transcript.html

(You can omit the password option and you'll be prompted to type it in a secure manner.)

The -i option receives the audio file that you want to transcript, and it'll store the text in transcript.html in HTML format. To select a different format, see below..


There are currently 4 formatters builtin: html (default), markdown, json, original. You can pass the -f option with any of those formatters in place.


Under the examples/ directory you can find a short audio file containing the first 30 seconds of Jacob Kaplan-Moss Keynote from Pycon 2015. There are also the end results of the transcription (html and markdown format).

Watson Documentation



Audio File types supported:

  • audio/flac
  • audio/l16
  • audio/wav
  • audio/ogg;codecs=opus
  • audio/mulaw