This image contains a demo STT service based on Kaldi ASR and py-kaldi-asr. Try it out by following these steps.
To start the STT service on your local machine, execute:
$ docker pull quay.io/mpuels/docker-py-kaldi-asr-and-model:kaldi-generic-en-tdnn_sp-r20180815
$ docker run --rm -p 127.0.0.1:8080:80/tcp quay.io/mpuels/docker-py-kaldi-asr-and-model:kaldi-generic-en-tdnn_sp-r20180815
To transfer an audio file for transcription to the service, in a second terminal, execute:
$ conda env create -f environment.yml
$ source activate py-kaldi-asr-client
$ ./asr_client.py asr.wav
For a list of available Kaldi models packaged in Docker containers, see https://quay.io/repository/mpuels/docker-py-kaldi-asr-and-model?tab=tags .
For a description of the available models, see https://github.com/gooofy/zamia-speech#asr-models .
Docker images are named according to the format
kaldi-generic-<LANG>-tdnn-<SIZE>-<RELEASEDATE>
<LANG>
: There are models for English (en
) and German (de
).<SIZE>
: Kaldi models come in two sizes:sp
(standard size) and250
( smaller size, suitable for realtime decoding on Raspberry Pi).<RELEASEDATE>
: Usually, models released later are trained on more data and hence have a lower word error rate.
The image is part of Zamia Speech.