Question: Using a pretrained encoder for getting the speaker embedding.
nischal-sanil opened this issue · 3 comments
Hi,
Did you guys experiment using a pretrained encoder for getting the speaker embedding similar to your previous work (AutoVC).
PS: Amazing work by the way!
Thanks,
@nischal-sanil did you make it work?
can you check my question please? #28
I have the same question @auspicious3000
Here you use the one-hot encoded embedding with a lent of 82 (the number of speakers it was pretrained), but could you generate a zeros-shot general embedding like in AutoVC. If I am correct the size of the used embedding was larger in that, I assume you cannot use that here.
So to wrap up: this method with the pretrained weights works only on the 82 speakers it was trained and conditioned on if we consider only the timbre conversion?
@terbed Yes. Unless you retrain the model.