
normalize_flow 和官方VITS代码不一样的实现方式

Opened this issue · 2 comments

nzpeng commented

Hi, I found the implementation of ResidualCouplingLayer.forward(normalize_flow.py) is different from official VITS code, and this section is not described in the VITS2 paper. what principle your implementation is based on? What are the advantages of this implementation? It seems more reasonable.

nzpeng commented

Additionally, there are two kl_loss in train.py,
loss_kl_dur = kl_loss(z_q_dur, logs_q_dur, m_p_dur, logs_p_dur, z_mask) * hps.train.c_kl_dur
loss_kl_audio = kl_loss_normal(m_p_audio, logs_p_audio, m_q_audio, logs_q_audio, z_mask) * hps.train.c_kl_audio
How to understand it? What principle it is based on?

@nzpeng flow module with two kl_losses is bidirectional prior/posterior module proposed in Naturalspeech[1].
And in my experience, It seems to be superior to original vits`s flow module in terms of speaker similarity and training speed.

[1] NaturalSpeech https://arxiv.org/pdf/2205.04421.pdf