yxlllc/DDSP-SVC

I wonder what enhancer_adaptive_key does

Closed this issue · 1 comments

I'm not sure what the function of enhancer_adaptive_key is and how it differs from a simple key variable. After using it, the key of the original music is the same, but the singer's tone seems to be applied as the tone when the tone is a little higher. Is this correct?

yxlllc commented

The purpose of this option is to solve the defect of nsf-hifigan vocoder, because the normal way to use it is limited by the vocal range. The highest key is about G5, If it exceeds G5, the sound quality will be greatly reduced.

Set enhancer_adaptive_key to n, the highest allowed key will shift to G5 + n, but the performance of the low key will be worse. Specifically, it is achieved by resampling (decreasing n key) -> extracting mel -> synthesizing wav (via nsf-hifigan) -> resampling (increasing n key).

This option will not work if shallow diffusion model (version 3.0) is used.