TensorSpeech/TensorFlowTTS

[MB_Melgan] Why is a model trained only generator is better than trained on both?

ggpid opened this issue · 0 comments

ggpid commented

Why is a model trained only on the generator(200k step) better in quality than a model trained on both the generator and discriminator(1M step)?

I'm training multiband melgan fine tuning with my own dataset which is in korean and about 40munutes
and i used kss pretrained model with --pretrained param

  • trained generator only (200k step) : audio

  • audio trained generator and discriminator(1M step) : audio

image