A TensorFlow Implementation of DCTTS

Requirements

NumPy >= 1.11.1
TensorFlow >= 1.3 (Note that the API of tf.contrib.layers.layer_norm has changed since 1.3)
librosa
tqdm
matplotlib
scipy

I train Portuguese models with

You can do STEP 2 and 3 at the same time, if you have more than one gpu card.

I generate speech samples based on phonetically balanced sentences as the original paper does. It is already included in the repo.

| Dataset | Samples | | :-------------| | TTS-Portuguese Corpus with Text |2115k|

| TTS-Portuguese Corpus with Phoneme |1734k|

A notebook supposed to be executed on https://colab.research.google.com is available:

TTS-Portuguese Corpus with Text Download this.

TTS-Portuguese Corpus with Phoneme Download this.

The changes not described in the paper were inspired by the repository: dc_tts
The paper didn't mention normalization, but without normalization I couldn't get it to work. So I added layer normalization.
- The paper didn't mention dropouts. So I added 0.05 for all layers.
The paper fixed the learning rate to 0.001, but it didn't work for me. So I decayed it.
This implementation is inspired by the repository: dc_tts