Wavebender GAN:

Deep architecture for high-quality and controllable speech synthesis through interpretable features and exchangeable neural synthesizers


This is the official code repository for the paper Wavebender GAN: An architecture for phonetically meaningful speech manipulation.

For audio examples, visit our demo page.


Data

All the 13100 audio samples from the LJ speech data set should be stored in data/wavs/. Then they should be split and the results should be stored in wavebender_features_data/train/, wavebender_features_data/test/. In these folders there are .txt files with the corresponding audios filed for each data set.

Tacotron 2

Before start training you need to download Tacotron2 and save in the main folder waveglow_256channels_universal_v5.pt and have it in the tacotron2 folder as well.

Training

Wavebender Net and GAN are trained separetelly. Therefore, you can train each one of them by running train_wavebender_net.py or train_wavebender_gan.py. Don't forget to have the data already in the correct format to run them.