What's the shape of network's input

Line 91 in 350e4b5

    
           output = net(X[preFetchBatchI*args.batchSize:(preFetchBatchI+1)*args.batchSize,:,:].permute(0,2,1), eps)

what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?
Is MFCC of the feature in your experiment?
Have you try other tools to extract features, such as librosa ...?
Thanks!

what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?

I'm not sure if I understand the question. chunk_len is determined by kaldi when creating the archives. It represents the temporal dimension: number of MFCC frames in the utterance.

Is MFCC of the feature in your experiment?

Yes, each input sample is a matrix - a sequence of MFCC features.

Have you try other tools to extract features, such as librosa ...?

Not at the moment.

"loss in nan", how to solve this?

The most common reason (in this repo) was due to the stats pooling layer. If all inputs are zero or same, then var(0) seems to result in NaN loss.
Please use the -noiseEps to avoid this.