hayeong0/DDDM-VC

Log on f0

markrmiller opened this issue · 2 comments

I see in the inference file, the log of f0 is taken, but then it's not used. I was wondering if that's a bug and you take the log for training too? I've experimented a bit with both (at 24khz) and didn't train all the way but didn't notice much difference.

FYI, I also tried using Soft Hubert fine tuned on my dataset, taking the hidden state from the 12th layer, and the wav2vec 2.0 model you used crushes it.

Thank you for pointing out the code duplication that occurred due to duplication of work in code organization and subsequent paper work.
We use F0 extracted with YAAPT, normalized per speaker, and then extract the F0 code through the F0 quantizer.
So, we have deleted the confusing line.

Thank you for sharing your experiences with various experiments!
We have tried using SSL models such as WavLM, XLS-R, and MMS, which have brought advantages in terms of scalability.