Does the model take reference audio for TTS like Coqui TTS(Zero shot)
Closed this issue · 1 comments
PS-AI commented
I saw that you have mentioned CoquiTTS as the code reference in the Readme.md. Does your model take a reference audio wav file as input along with the text and produce speech in that voice?
gokulkarthik commented
No, we trained speaker embeddings by one-hot encoding the speaker id in the dataset. So, it won't directly work for unseen speakers. However, you could try making it work for the unseen reference audio by adding the neural network with audio input that approximates our model's speaker embedding layer output with the speaker id input.