What's the shape of network's input
Closed this issue · 3 comments
nonday commented
pytorch_xvectors/train_xent.py
Line 91 in 350e4b5
- what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?
- Is MFCC of the feature in your experiment?
- Have you try other tools to extract features, such as librosa ...?
Thanks!
manojpamk commented
- what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?
I'm not sure if I understand the question. chunk_len is determined by kaldi when creating the archives. It represents the temporal dimension: number of MFCC frames in the utterance.
- Is MFCC of the feature in your experiment?
Yes, each input sample is a matrix - a sequence of MFCC features.
- Have you try other tools to extract features, such as librosa ...?
Not at the moment.
nonday commented
"loss in nan", how to solve this?
manojpamk commented
The most common reason (in this repo) was due to the stats pooling layer. If all inputs are zero or same, then var(0) seems to result in NaN loss.
Please use the -noiseEps
to avoid this.