Project DeepSpeech
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.
NOTE: This documentation applies to the master version of DeepSpeech only. Documentation for all versions is published on deepspeech.readthedocs.io.
To install and use DeepSpeech all you have to do is:
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate
# Install DeepSpeech
pip3 install deepspeech
# Download pre-trained English model files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer
# Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/audio-0.7.0.tar.gz
tar xvf audio-0.7.0.tar.gz
# Transcribe an audio file
deepspeech --model deepspeech-0.7.0-models.pbmm --scorer deepspeech-0.7.0-models.scorer --audio audio/2830-3980-0043.wav
A pre-trained English model is available for use and can be downloaded using the instructions below. A package with some example audio files is available for download in our release notes.
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the release notes to find which GPUs are supported. To run deepspeech
on a GPU, install the GPU specific package:
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate
# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu
# Transcribe an audio file.
deepspeech --model deepspeech-0.7.0-models.pbmm --scorer deepspeech-0.7.0-models.scorer --audio audio/2830-3980-0043.wav
Please ensure you have the required CUDA dependencies.
See the output of deepspeech -h
for more information on the use of deepspeech
. (If you experience problems running deepspeech
, please check required runtime dependencies).
Table of Contents
- Using a Pre-trained Model
- Trying out DeepSpeech with examples
- Training your own Model
- Prerequisites for training a model
- Getting the training code
- Installing Python dependencies
- Recommendations
- Common Voice training data
- Training a model
- Checkpointing
- Exporting a model for inference
- Exporting a model for TFLite
- Making a mmap-able model for inference
- Continuing training from a release model
- Training with Augmentation
- Contribution guidelines
- Contact/Getting Help