How much time is the training dataset?
youngsuenXMLY opened this issue ยท 4 comments
Hello, I trained the VCTK, and the training process looks like this
The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances.
What if the utterance number is 1 for each speaker?
Each speaker in VCTK corpus has a few hundreds of samples.
I think if you only take one utterance for each speaker then you don't have enough data to train.
What about the situation that we have enough utterances, but only 1 or 2 utterances for a speaker?
Hi, that's a good question. I haven't try a dataset like this so I don't kown how it'll perform.
I suppose it'll
have difficulty in speaker classification and speaker adversarial learning because there have to be a super-large softmax output layer for our model, i.e equal to number of speakers.