how to generate a more diverse voice by preset a random latent style embedding?

Question

how to generate a more diverse voice by preset a random latent style embedding?

blldd opened this issue a year ago · 2 comments

Hello Yinghao, I'm very interested in generating a more diverse sound (in timbre) by setting random latent embeddings, but I found that the generated audio is very bad, is there something wrong with my preset vectors, so how do I set the correct random vectors?

Answer 1 · 2023-12-17T04:05:49.000Z

The latent style space isn't exactly random. It actually lies on a high-dimensional sphere. You will have to randomly sample a Gaussian noise, normalize it to norm 1 and then scale it to up to the average norm of the styles for your model. Another way is just use a reference voice to compute the style and then sample by setting alpha=1, beta=1.

Answer 2 · 2024-07-26T06:45:29.000Z

The latent style space isn't exactly random. It actually lies on a high-dimensional sphere. You will have to randomly sample a Gaussian noise, normalize it to norm 1 and then scale it to up to the average norm of the styles for your model. Another way is just use a reference voice to compute the style and then sample by setting alpha=1, beta=1.

how can i change the values of alpha and beta while finetuning? I do not want to make the model recompute each time I am making an inference (i can only find the values to change timbre and prosody in inference notebook, and not in the finetuning config_ft.yml file)