A conditional generative model based on the variational autoencoder (VAE) to get disentangled representation to have controllable Text to Speech Synthesis.
I am working on this paper https://arxiv.org/pdf/1810.07217.pdf
A conditional generative model based on the variational autoencoder (VAE) to get disentangled representation to have controllable Text to Speech Synthesis.
I am working on this paper https://arxiv.org/pdf/1810.07217.pdf