Unofficial PyTorch implementation of Multi-Band MelGAN paper. This implementation uses Seungwon Park's MelGAN repo as a base and PQMF filters
implementation from this repo.
MelGAN :
Multi-band MelGAN:
Tested on Python 3.6
pip install -r requirements.txt
- Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
- preprocess:
python preprocess.py -c config/default.yaml -d [data's root path]
- Edit configuration
yaml
file
python trainer.py -c [config yaml file] -n [name of the run]
cp config/default.yaml config/config.yaml
and then editconfig.yaml
- Write down the root path of train/validation files to 2nd/3rd line.
- Each path should contain pairs of
*.wav
with corresponding (preprocessed)*.mel
file. - The data loader parses list of files within the path recursively.
- For Multi-Band training use
config/mb_melgan
config file in-c
tensorboard --logdir logs/
Check out here.
python inference.py -p [checkpoint path] -i [input mel path]
- Multi-band MelGAN
- MelGAN
- Pytorch implementation of melgan
- Official implementation of melgan
- Multi, Full-band melgan implementation
- Nvidia's pre-processing
- WaveRNN
BSD 3-Clause License.
- utils/stft.py by Prem Seetharaman (BSD 3-Clause License)
- datasets/mel2samp.py from https://github.com/NVIDIA/waveglow (BSD 3-Clause License)
- utils/hparams.py from https://github.com/HarryVolek/PyTorch_Speaker_Verification (No License specified)
- How to Train a GAN? Tips and tricks to make GANs work by Soumith Chintala
- Official MelGAN implementation by original authors
- Reproduction of MelGAN - NeurIPS 2019 Reproducibility Challenge (Ablation Track) by Yifei Zhao, Yichao Yang, and Yang Gao
- "replacing the average pooling layer with max pooling layer and replacing reflection padding with replication padding improves the performance significantly, while combining them produces worse results"