About Speaker Embeddings

Question

About Speaker Embeddings

Closed this issue 2 years ago · 4 comments

Hello again, I notice that for training, if using the pretrained speaker encoder and SR, the same speaker embedding is used regardless of the SR augmentation.
(edit: also the audio file and spectrogram are the same, regardless of SR)
I just wanted to double check if this is by design?

Thanks so much for helping everyone :)

Answer 1 · 2023-01-23T12:16:18.000Z

Yes. SR is only used to help content representation learning, i.e., multiple SR augmented wavs correspond to one same content representation.

Answer 2 · 2023-01-23T19:28:16.000Z

Excellent, thank you.
One more question, are the test.txt files used anywhere?

Answer 3 · 2023-01-24T03:11:05.000Z

Nowhere.
It is only used for experimental test. We randomly select seen speaker's unseen utterances from this file for test.

Answer 4 · 2023-01-24T05:08:54.000Z

Understood! Thank you for your insight