The expressive Tacotron framework includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.
- More attention modes
- Reduction factor supported (Tacotron1)
- Feeding r-th features for reduction factor in Decoder (Tacotron1)
- Masked loss
Single Tacotron2 with Forward Attention by defalut(r=2). If you want to train with expressive mode, you can reference Expressive Tacotron.
- transfer texts to phones, and save as "phones_path" in hparams.py and change phone dictionary in text.py
python train.py
for single GPUpython -m multiproc train.py
for multi GPUs
python synthesis.py -w checkpoints/checkpoint_200k_steps.pyt -i "hello word" --vocoder gl
Default Griffin_Lim Vocoder. For other command line options, please refer to the synthesis.py
section.
This implementation uses code from the following repos: NVIDIA, MozillaTTS, ESPNet, ERISHA, ForwardAttention