Why skip long utterances?
JRMeyer opened this issue · 1 comments
JRMeyer commented
It seems the training code skips utterances longer that 1,000
frames:
Does this mean you skip all utterances longer than ~12.5 seconds (in the paper 12.5ms
is the reported skip length for calculating the spectogram)? If so, why?
jxzhanggg commented
Yes, you are right. I wrote so because I found the serveral quite long utterance may cause padding the batch too long. Therefore, these batch will make the out of memory problem (or you have to use small batch size). Eliminating these super long utterances will not affect training too much, just slightly reducing training corpus.