Why skip long utterances?

It seems the training code skips utterances longer that 1,000 frames:

nonparaSeq2seqVC_code/pre-train/reader/reader.py

Line 17 in 758ecb6

if int(n_frame) >= 1000:

Does this mean you skip all utterances longer than ~12.5 seconds (in the paper 12.5ms is the reported skip length for calculating the spectogram)? If so, why?

Yes, you are right. I wrote so because I found the serveral quite long utterance may cause padding the batch too long. Therefore, these batch will make the out of memory problem (or you have to use small batch size). Eliminating these super long utterances will not affect training too much, just slightly reducing training corpus.