/tts-peech

DelightfulTTS with Hifi-GAN and Univnet vocoders

Primary LanguageJupyter Notebook

TTS-Framework

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

PyTorch Lightning

Medium series (with memes): Machine Learning Text-To-Speech: Intro, Little Theory and Math

Demo and weights

Weights can be found inside the hf space. Demo

DelightfulTTS + UnivNet, 22.05 khz, check hf space demo PeechTTSv22050

DelightfulTTS Weights: epoch=5816-step=390418.ckpt

Univnet Weights: vocoder_pretrained.pt

DelightfulTTS + HifiGAN, 44.1 khz, check hf space demo PeechTTSv44100

DelightfulTTS Weights: epoch=2450-step=183470.ckpt

HifiGAN Weights: epoch=19-step=44480.ckpt

Run locally

Install deps

sudo apt install ffmpeg libasound2-dev build-essential espeak-ng -y

Create env from the environment.yml file:

conda env create -f environment.yml

# After the setup
conda activate tts_framework

# Install torch
pip install --upgrade --force-reinstall --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

# Install nemo
pip install nemo_toolkit['all']

# Run demo
python app.py

Generate docs:

# live preview server
mkdocs serve

# build a static site from your Markdown files
mkdocs build

Test cases:

python -m unittest discover -v