PPSpeech: Phrase based Parallel End-to-End TTS System

Pytorch implementation of PPSpeech: Phrase based Parallel End-to-End TTS System.

Requirements :

All code written in Python 3.6.2 .

Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

Installing other requirements :

pip install -r requirements.txt

To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

Note:

In the paper author break a single sentence into phrases by predicting intonation phrase boundaries(L3) using an expanded CRF supporting dynamic features.
But in this repo for sake of simplicity I divide sentence into phrases by randomly grouping the words together, which definitely not a true prosodic boundaries, which ultimately hurt the quality for text to speech.But it's don't bother me as I code this repo for just experimentation.
For better quality use some smart/AI based Phase Boundry detection algo as author used in paper.

Pre-processing

python preprocessing.py -d path_of_wavs --config configs\default.yaml

Training

python train.py -o checkpoints -l logs --name "first" --config configs\default.yaml

Inference

python inference.py  -c "checkpoints\first\checkpoint_first_32000.pyt" -r "LJ002-0321.npy" --text put_your_text_here --config "configs\default.yaml" --name wave_file_name --mode 1

WanCaiYan/PPSpeech

PPSpeech: Phrase based Parallel End-to-End TTS System

Requirements :

Note:

Pre-processing

Training

Inference

References