yxlllc/DDSP-SVC

Questions in reasoning

Closed this issue · 1 comments

When I proceed with inference using the model I have trained, it seems that there are a lot of voices of the original sound source left. Can I increase the ratio of the voices I have trained? (Same function as add_noise_step of diff-svc model)

yxlllc commented

You don't describe too much details, but generally contentvec768l12 encoder will have less timbre leakage.