Mismatch shape for f0 prediction and KL divergence

Question

Mismatch shape for f0 prediction and KL divergence

Closed this issue a year ago · 2 comments

Hi, I recently found your VISinger implementation (this is awesome) but I ran into a few problems.
The first problem is that the shape is mismatched for f0 predictions as well as KL divergence calculations (logs_p & logs_q).

x: torch.Size([6, 192, 40])
x_frame: torch.Size([6, 192, 581]) 
ground truth f0: torch.Size([6, 558]) 
pred f0: torch.Size([6, 581])

I am also curious about how you choose hyperparameters. For instance, I understand 2595 & 700 are for mel conversion, but how do you decide 500 here?
featur_pit = 2595.0 * np.log10(1.0 + featur_pit / 700.0) / 500

Answer 1 · 2023-01-15T20:03:01.000Z

The mel length mismatch problem (581 and 558) should be solved during preprocessing. Be sure that the script is python prepare/data_vits_phn.py not python prepare/data_vits.py which is from the original repo.
For the hyperparameters I chose, I just simply emailed the original author and asked him about how he calculated the pitch. I think the reason that he divided 500 is to balance the pitch loss so that it won't be too large.

Answer 2 · 2023-01-18T16:40:32.000Z

Thank you so much for your reply! I resample wav files to 24k explicitly, and it works well now!