Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.
Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.
As a bonus:
- No Kaldi;
- No compilation;
- No 20-step instructions;
Also we have published TTS models that satisfy the following criteria:
- One-line usage;
- A large library of voices;
- A fully end-to-end pipeline;
- Naturally sounding speech;
- No GPU or training required;
- Minimalism and lack of dependencies;
- Faster than real-time on one CPU thread (!!!);
- Support for 16kHz and 8kHz out of the box;
All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.
Currently we provide the following checkpoints:
PyTorch | ONNX | TensorFlow | Quantization | Quality | Colab | |
English (en_v2 ) |
✔️ | ✔️ | ✔️ | ⌛ | link | |
German (de_v1 ) |
✔️ | ✔️ | ✔️ | ⌛ | link | |
Spanish (es_v1 ) |
✔️ | ✔️ | ✔️ | ⌛ | link | |
Ukrainian (ua_v3 ) |
✔️ | ✔️ | ⌛ | ✔️ | N/A |
- All examples:
- torch (used to clone the repo in tf and onnx examples)
- torchaudio
- soundfile
- omegaconf
- Additional for ONNX examples:
- onnx
- onnxruntime
- Additional for TensorFlow examples:
- tensorflow
- tensorflow_hub
Please see the provided Colab for details for each example below.
import torch
import zipfile
import torchaudio
from glob import glob
device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
language='en', # also available 'de', 'es'
(read_batch, split_into_batches,
read_audio, prepare_model_input) = utils # see function signature for details
# download a single file, any format compatible with TorchAudio (soundfile backend)
dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
output = model(input)
for example in output:
You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.
import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf
language = 'en' # also available 'de', 'es'
# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
read_audio, prepare_model_input) = utils
# see available models
torch.hub.download_url_to_file('', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages
# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
ort_session = onnxruntime.InferenceSession('model.onnx')
# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))
# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs =, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
SavedModel example
import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf
language = 'en' # also available 'de', 'es'
# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
read_audio, prepare_model_input) = utils
# see available models
torch.hub.download_url_to_file('', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages
# load the actual tf model
torch.hub.download_url_to_file(, 'tf_model.tar.gz')'rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model', shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')
# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))
# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.
Currently we provide the following speakers:
Basic dependencies (see colab):
- torch
- omegaconf
- torchaudio (required only because models are hosted together with STT, not required for work)
import torch
language = 'ru'
speaker = 'kseniya_16khz'
device = torch.device('cpu')
model, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models',
model = # gpu or cpu
audio = apply_tts(texts=[example_text],
Also check out our wiki.
Please refer to this wiki sections:
Please refer here.
Try our models, create an issue, join our chat, email us, read our news.
Please see our wiki and tiers for relevant information and email us.
@misc{Silero Models,
author = {Silero Team},
title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{}},
commit = {insert_some_commit_here},
email = {}