TTS-Framework

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

A comprehensive review of opensource Text-to-Speech (TTS) Models

Medium series (with memes): Machine Learning Text-To-Speech: Intro, Little Theory and Math

Demo and weights

Weights can be found inside the hf space.

DelightfulTTS + UnivNet, 22.05 khz, check hf space demo PeechTTSv22050

DelightfulTTS Weights: epoch=5816-step=390418.ckpt

Univnet Weights: vocoder_pretrained.pt

DelightfulTTS + HifiGAN, 44.1 khz, check hf space demo PeechTTSv44100

DelightfulTTS Weights: epoch=2450-step=183470.ckpt

HifiGAN Weights: epoch=19-step=44480.ckpt

Run locally

Install deps

sudo apt install ffmpeg libasound2-dev build-essential espeak-ng -y

Create env from the environment.yml file:

conda env create -f environment.yml

# After the setup
conda activate tts_framework

# Install torch
pip install --upgrade --force-reinstall --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

# Install nemo
pip install nemo_toolkit['all']

# Run demo
python app.py

Generate docs:

# live preview server
mkdocs serve

# build a static site from your Markdown files
mkdocs build

Test cases:

python -m unittest discover -v

PeechApp/tts-peech

TTS-Framework

A comprehensive review of opensource Text-to-Speech (TTS) Models

Demo and weights

DelightfulTTS + UnivNet, 22.05 khz, check hf space demo PeechTTSv22050

DelightfulTTS + HifiGAN, 44.1 khz, check hf space demo PeechTTSv44100

Run locally

Install deps

Generate docs:

Test cases: