OlaWod/FreeVC

About Speaker Embeddings

Closed this issue · 4 comments

Hello again, I notice that for training, if using the pretrained speaker encoder and SR, the same speaker embedding is used regardless of the SR augmentation.
(edit: also the audio file and spectrogram are the same, regardless of SR)
I just wanted to double check if this is by design?

Thanks so much for helping everyone :)

Yes. SR is only used to help content representation learning, i.e., multiple SR augmented wavs correspond to one same content representation.

Excellent, thank you.
One more question, are the test.txt files used anywhere?

Nowhere.
It is only used for experimental test. We randomly select seen speaker's unseen utterances from this file for test.

Understood! Thank you for your insight