About Speaker Embeddings
Closed this issue · 4 comments
Shmuel-Gruel commented
Hello again, I notice that for training, if using the pretrained speaker encoder and SR, the same speaker embedding is used regardless of the SR augmentation.
(edit: also the audio file and spectrogram are the same, regardless of SR)
I just wanted to double check if this is by design?
Thanks so much for helping everyone :)
OlaWod commented
Yes. SR is only used to help content representation learning, i.e., multiple SR augmented wavs correspond to one same content representation.
Shmuel-Gruel commented
Excellent, thank you.
One more question, are the test.txt files used anywhere?
OlaWod commented
Nowhere.
It is only used for experimental test. We randomly select seen speaker's unseen utterances from this file for test.
Shmuel-Gruel commented
Understood! Thank you for your insight