Speech
deepconsc opened this issue · 4 comments
deepconsc commented
Great work!
Planning to run ff transformer network in speech domain overnight with madgrad.
Any heads up?
adefazio commented
My main suggestion would be to make sure you try less weight decay then you would normally use (if any)
deepconsc commented
No weight decay it is:) Thank you!
Will provide feedback after initial run.
deepconsc commented
@adefazio It worked really well during pretraining. After discriminator was enabled, it didn't show the progress with same rates as pure Adam. I have to mention - during pretraining the madgrad actually did better job than Adam - helped to model pitch, energy and duration really well.
It's obvious GAN-based training would need better tuning of madgrad, but it looks promising!
adefazio commented
Thanks for the info! Interesting result.