How much time is the training dataset?

Question

How much time is the training dataset?

youngsuenXMLY opened this issue 5 years ago · 4 comments

Answer 1 · 2020-02-13T01:34:16.000Z

Hello, I trained the VCTK, and the training process looks like this

The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances.
What if the utterance number is 1 for each speaker?

Answer 2 · 2020-02-13T03:32:43.000Z

Hello, I trained the VCTK, and the training process looks like this

The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances.
What if the utterance number is 1 for each speaker?

Each speaker in VCTK corpus has a few hundreds of samples.
I think if you only take one utterance for each speaker then you don't have enough data to train.

Answer 3 · 2020-02-13T12:19:18.000Z

What about the situation that we have enough utterances, but only 1 or 2 utterances for a speaker?

Answer 4 · 2020-02-13T13:28:52.000Z

Hi, that's a good question. I haven't try a dataset like this so I don't kown how it'll perform.
I suppose it'll
have difficulty in speaker classification and speaker adversarial learning because there have to be a super-large softmax output layer for our model, i.e equal to number of speakers.