/tts-german-pytorch

TTS (FastPitch) for German

Primary LanguagePython

tts-german-pytorch

FastPitch (arXiv) trained on Thorsten Müller's Thorsten–2022.10 and Thorsten-21.06-emotional datasets.

Audio Samples

You can listen to some audio samples here.

Quick Setup

Required packages: torch torchaudio pyyaml phonemizer

Please refer to here to install phonemizer and the espeak-ng backend.

~ for training: librosa matplotlib tensorboard

~ for the demo app: fastapi "uvicorn[standard]"

Download the pretrained weights for the FastPitch model link.

Download the HiFi-GAN vocoder weights (link). Either put them into pretrained/hifigan-thor-v1 or edit the following lines in configs/basic.yaml.

# vocoder
vocoder_state_path: pretrained/hifigan-thor-v1/hifigan-thor.pth
vocoder_config_path: pretrained/hifigan-thor-v1/config.json

Using the models

The FastPitch from models.fastpitch is a wrapper that simplifies text-to-mel inference. The FastPitch2Wave model includes the HiFi-GAN vocoder for direct text-to-speech inference.

Inferring the Mel spectrogram

from models.fastpitch import FastPitch
model = FastPitch('pretrained/fastpitch_de.pth')
model = model.cuda()
mel_spec = model.ttmel("Hallo Welt!")

End-to-end Text-to-Speech

from models.fastpitch import FastPitch2Wave
model = FastPitch2Wave('pretrained/fastpitch_de.pth')
model = model.cuda()
wave = model.tts("Hallo Welt!")

wave_list = model.tts(["null", "eins", "zwei", "drei", "vier", "fünf"])

Web app

The web app uses the FastAPI library. To run the app you need the following packages:

fastapi: for the backend api | uvicorn: for serving the app

Install with: pip install fastapi "uvicorn[standard]"

Run with: python app.py

Preview:

Acknowledgements

Thanks to Thorsten Müller for the high-quality datasets.

The FastPitch files stem from NVIDIA's DeepLearningExamples