Can you share experience on tuning the multiplers c_kl_fwd and c_e2e
feng-yufei opened this issue · 3 comments
Hello Heatz123,
Recently I am doing experiments on adding the bi-directional posterior/prior loss, and I found during finetuning (after the warm-up) this additional loss with a larger multiplier ruins the trained VITS model. I saw you mention multipliers in the readme file and you set c_kl_fwd = 0.001, which is very small, as well as the c_e2e =0.1, so I hope to confirm with you about the experiment results you may encounter.
Since I did not see any clue about these parameters in the original natural speech paper, so I guess you tuned it based on your experiments. Do you also observe similar problems when a larger c_kl_fwd or c_e2e ruin the model? Do you think or compared a very small c_kl_fwd will have effects on the inference quality?
Thanks
Hello, @feng-yufei.
I have also observed similar issues with larger values for c_kl_fwd and c_e2e affecting the model training, which is why their values are set to such. But note that these values may not be optimal since the paper doesn't mention the multipliers they used for loss terms.
Regarding c_kl_fwd, I haven't rigorously compared the results of c_kl_fwd=0 and c_kl_fwd=0.001 in terms of audio quality. However, for the loss_fwd, it seemed that even setting a very small value for c_kl_fwd significantly lowered the loss_fwd term as the training progressed. This indicates that this change affects the distribution of the enhanced prior or posterior in some way, and maybe to a better direction (reducing the training-inference mismatch) as stated in the paper.
As for the c_e2e term, the authors didn't conduct an ablation study for its use, and I didn't notice any improvement from using it. This term might help if we use tuning stage (last 2k epochs), but certainly using high values for c_e2e (such as 1.0) ruins training. So I think it is safe to set this lower enough.
I hope this helps. Please let me know if you have any further questions or concerns. Thank you.
Thanks for your reply, I will update my further findings, and hopefully some quality comparison, once several experiments finished.
@feng-yufei Hey, results please? Feels like c_kl_fwd indeed has to be small