Expressive Tacotron (implementation with Pytorch)

Introduction

The expressive Tacotron framework includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Available recipes

Expressive Mode

Attention Mode

Differences from Nvidia Tacotron

  • More attention modes
  • Reduction factor supported (Tacotron1)
  • Feeding r-th features for reduction factor in Decoder (Tacotron1)
  • Masked loss

Training

Single Tacotron2 with Forward Attention by defalut(r=2). If you want to train with expressive mode, you can reference Expressive Tacotron.

  1. transfer texts to phones, and save as "phones_path" in hparams.py and change phone dictionary in text.py
  2. python train.py for single GPU
  3. python -m multiproc train.py for multi GPUs

Inference Demo

  1. python synthesis.py -w checkpoints/checkpoint_200k_steps.pyt -i "hello word" --vocoder gl

Default Griffin_Lim Vocoder. For other command line options, please refer to the synthesis.py section.

Acknowledgements

This implementation uses code from the following repos: NVIDIA, MozillaTTS, ESPNet, ERISHA, ForwardAttention