jxzhanggg/nonparaSeq2seqVC_code

Why skip long utterances?

JRMeyer opened this issue · 1 comments

It seems the training code skips utterances longer that 1,000 frames:

if int(n_frame) >= 1000:

Does this mean you skip all utterances longer than ~12.5 seconds (in the paper 12.5ms is the reported skip length for calculating the spectogram)? If so, why?

Yes, you are right. I wrote so because I found the serveral quite long utterance may cause padding the batch too long. Therefore, these batch will make the out of memory problem (or you have to use small batch size). Eliminating these super long utterances will not affect training too much, just slightly reducing training corpus.