- Clone this repository
https://github.com/NitheshChandher/Neural-HMM.git
- Initalise the submodules
git submodule init; git submodule update
- Make sure you have docker installed and running.
- Follow the instructions to setup Nvidia Docker.
- Download our pre-trained LJ Speech models
- Download HiFi gan pretrained HiFiGAN models and config file and place them on the
hifigan
folder - Run
bash start.sh
and it will install all the dependencies and run the container. - Run jupyter notebook and open
speech_synthesis_workshop.ipynb
.
- Download and extract the CMU Arctic dataset in the
data
directory. Also, move the filestrain.txt
andvalidation.txt
to thedata/filelists
directory. - Check
src/hparams.py
for hyperparameters and set GPUs.- For multi-GPU training, set GPUs to
[0, 1 ..]
- For CPU training (not recommended), set GPUs to an empty list
[]
- Check the location of transcriptions
- For multi-GPU training, set GPUs to
- Once hparams are updated run
python generate_data_properties.py
to generatedata_parameters.pt
for your dataset. - Run
python train.py -c checkpoints/Neural-HMM(Female).ckpt
to fine tune the pre-trained LJ Speech model on the CMU Arctic dataset.- Checkpoints will be saved in the
hparams.checkpoint_dir
. - Tensorboard logs will be saved in the
hparams.tensorboard_log_dir
.
- Checkpoints will be saved in the
- To resume training, run
python train.py -c <CHECKPOINT_PATH>
- In
src.hparams.py
changehparams.precision
to16
for mixed precision and32
for full precision.
- Since the code uses PyTorch Lightning, providing more than one element in the list of GPUs will enable multi-GPU training. So change
hparams.gpus
to[0, 1, 2]
for multi-GPU training and single element[0]
for single-GPU training.
- If you encounter this error message
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
, this is a known issue in PyTorch Dataloader. - It will be fixed when PyTorch releases a new Docker container image with updated version of Torch. If you are not using docker this can be removed with
torch > 1.9.1
If you use or build on this method or code for your research, please cite the paper:
@inproceedings{mehta2022neural,
title={Neural {HMM}s are all you need (for high-quality attention-free {TTS})},
author={Mehta, Shivam and Sz{\'e}kely, {\'E}va and Beskow, Jonas and Henter, Gustav Eje},
booktitle={Proc. ICASSP},
year={2022}
}
The code implementation is based on Nvidia's implementation of Tacotron 2 and uses PyTorch Lightning for boilerplate-free code.