jxzhanggg/nonparaSeq2seqVC_code

How much time is the training dataset?

youngsuenXMLY opened this issue ยท 4 comments

How much time is the training dataset?

Hello, I trained the VCTK, and the training process looks like this
image
The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances.
What if the utterance number is 1 for each speaker?

Hello, I trained the VCTK, and the training process looks like this
image
The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances.
What if the utterance number is 1 for each speaker?

Each speaker in VCTK corpus has a few hundreds of samples.
I think if you only take one utterance for each speaker then you don't have enough data to train.

What about the situation that we have enough utterances, but only 1 or 2 utterances for a speaker?

Hi, that's a good question. I haven't try a dataset like this so I don't kown how it'll perform.
I suppose it'll
have difficulty in speaker classification and speaker adversarial learning because there have to be a super-large softmax output layer for our model, i.e equal to number of speakers.