LittleOne008/FastSpeech

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

Jupyter NotebookMIT

FastSpeech

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

Training

Set data_path in hparams.py as the LJSpeech folder
Set teacher_dir in hparams.py as the data directory where the alignments and melspectrogram targets are saved
Put checkpoint of the pre-trained transformer-tts (weights of the embedding/encoder layers are used)
python train.py

Training curves (orange: character / blue: phoneme)

The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention

train:val:test=8:1:1, total => character:1126 / phoneme:3412

Training plots (orange: batch_size:64 / blue: batch_size:32)

Audio Samples

You can hear the audio samples here