yl4579/StyleTTS2

how to generate a more diverse voice by preset a random latent style embedding?

blldd opened this issue · 2 comments

blldd commented

Hello Yinghao, I'm very interested in generating a more diverse sound (in timbre) by setting random latent embeddings, but I found that the generated audio is very bad, is there something wrong with my preset vectors, so how do I set the correct random vectors?

yl4579 commented

The latent style space isn't exactly random. It actually lies on a high-dimensional sphere. You will have to randomly sample a Gaussian noise, normalize it to norm 1 and then scale it to up to the average norm of the styles for your model. Another way is just use a reference voice to compute the style and then sample by setting alpha=1, beta=1.

The latent style space isn't exactly random. It actually lies on a high-dimensional sphere. You will have to randomly sample a Gaussian noise, normalize it to norm 1 and then scale it to up to the average norm of the styles for your model. Another way is just use a reference voice to compute the style and then sample by setting alpha=1, beta=1.

how can i change the values of alpha and beta while finetuning? I do not want to make the model recompute each time I am making an inference (i can only find the values to change timbre and prosody in inference notebook, and not in the finetuning config_ft.yml file)