WASP Summer School 2022 - Workshop on Speech Synthesis

Installations

Clone this repository https://github.com/NitheshChandher/Neural-HMM.git
Initalise the submodules git submodule init; git submodule update
Make sure you have docker installed and running.
- Follow the instructions to setup Nvidia Docker.
Download our pre-trained LJ Speech models
Download HiFi gan pretrained HiFiGAN models and config file and place them on the hifigan folder
Run bash start.sh and it will install all the dependencies and run the container.
Run jupyter notebook and open speech_synthesis_workshop.ipynb.

Finetune on CMU Arctic dataset

Download and extract the CMU Arctic dataset in the data directory. Also, move the files train.txt and validation.txt to the data/filelists directory.
Check src/hparams.py for hyperparameters and set GPUs.
1. For multi-GPU training, set GPUs to [0, 1 ..]
2. For CPU training (not recommended), set GPUs to an empty list []
3. Check the location of transcriptions
Once hparams are updated run python generate_data_properties.py to generate data_parameters.pt for your dataset.
Run python train.py -c checkpoints/Neural-HMM(Female).ckpt to fine tune the pre-trained LJ Speech model on the CMU Arctic dataset.
1. Checkpoints will be saved in the hparams.checkpoint_dir.
2. Tensorboard logs will be saved in the hparams.tensorboard_log_dir.
To resume training, run python train.py -c <CHECKPOINT_PATH>

Miscellaneous

Mixed-precision training or full-precision training

In src.hparams.py change hparams.precision to 16 for mixed precision and 32 for full precision.

Multi-GPU training or single-GPU training

Since the code uses PyTorch Lightning, providing more than one element in the list of GPUs will enable multi-GPU training. So change hparams.gpus to [0, 1, 2] for multi-GPU training and single element [0] for single-GPU training.

Known issues/warnings

PyTorch dataloader

If you encounter this error message [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool), this is a known issue in PyTorch Dataloader.
It will be fixed when PyTorch releases a new Docker container image with updated version of Torch. If you are not using docker this can be removed with torch > 1.9.1

Citation information

If you use or build on this method or code for your research, please cite the paper:

@inproceedings{mehta2022neural,
  title={Neural {HMM}s are all you need (for high-quality attention-free {TTS})},
  author={Mehta, Shivam and Sz{\'e}kely, {\'E}va and Beskow, Jonas and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2022}
}

Acknowledgements

The code implementation is based on Nvidia's implementation of Tacotron 2 and uses PyTorch Lightning for boilerplate-free code.

NitheshChandher/Neural-HMM