Does it will work on unseen data also? will it be able to convert voice of unseen speaker with different content than that of data in training, will we obtain the disentanglement?
pycodebook opened this issue ยท 8 comments
I have the same doubt, can someone please clarify.
Thanks in advance.
I have the same question.
You can make it generalize to unseen speakers by training it the same way as AutoVC.
@auspicious3000 could you explain what you mean by 'training it the same way as AutoVC"?
Repeat all steps from here https://github.com/auspicious3000/autovc#2train-model ?
Or change make_metadata.py in SpeechSplit to embed speaker encodings but train using model from SpeechSplit?
@skol101 it means training with generalized speaker embeddings instead of one-hot embeddings
I did that -- used make_metadata.py from AutoVC.
Now I have removed validation part from the solver.py in this repo, because there's nothing to validate it against (as in solver_encoder.py in AutoVC) and started training.
Am I doing this correctly? Your help is very appreciated.
sounds correct, but you don't need to remove the validation part