NVIDIA/waveglow

Issue training for male dataset

zshakeri opened this issue · 1 comments

I have been trying to train the model for a male dataset. I've tried training from scratch and finetuning the provided checkpoint. I tried with the default parameters (batchsize 3 - 8GPUs) and increasing batch size to 32 on 8 GPUs and playing around with the lr. In all cases, the error saturates to -5 around 5k-20k steps and then either increases or blows up. Do you have any suggestions what to do in this case? Have you trained the model for any dataset other than LJ?
Examples of training loss curves:
Screen Shot 2020-12-16 at 10 38 57 AM
Screen Shot 2021-01-07 at 11 47 22 AM