facebookresearch/madgrad

Speech

deepconsc opened this issue · 4 comments

Great work!

Planning to run ff transformer network in speech domain overnight with madgrad.
Any heads up?

My main suggestion would be to make sure you try less weight decay then you would normally use (if any)

No weight decay it is:) Thank you!
Will provide feedback after initial run.

@adefazio It worked really well during pretraining. After discriminator was enabled, it didn't show the progress with same rates as pure Adam. I have to mention - during pretraining the madgrad actually did better job than Adam - helped to model pitch, energy and duration really well.
It's obvious GAN-based training would need better tuning of madgrad, but it looks promising!

Thanks for the info! Interesting result.