mravanelli/pytorch-kaldi

Chunk Mean and Variance Normalization

timolohrenz opened this issue · 3 comments

Dear PyTorch-Kaldi Team,

in the recent week, I tried to deploy one of your trained TIMIT models in our own framework and was wondering why I won't be able to achieve the same PER as when I evaluate the model with the pytorch-kaldi toolkit as I always was about 5% absolute below the PER of PyTorch-Kaldi. Instead of using chunks or batches we call the model "one-sequence-at-a-time".

I double-checked if the module call gets the same arguments as in PyTorch-Kaldi and if the net inputs are the same features, use the same precision, got the right dimensions, and if the outputs of the subnets are correctly passed on etc. etc., but the final net outputs always looked slightly different than in PyTorch-Kaldi

In the end the difference was due to the function load_chunk which always normalizes means and variances of the input features over the whole chunk. Before I found this line (should've looked there earlier -.-) I thought all normalization were done by the Kaldi functions (fea_opts=)

Even though this isn't an issue by itself, as this processing is also mentioned in the PyTorch-Kaldi paper, it might help others to raise more awareness of this function and maybe make it an configurable option, as normalizing over the whole test-set is not always possible when deploying a model.

Best regards,
Timo

This is a good point. In SpeechBrain normalisation is a specific and well-detailed step. We won't add such an option in PyTorch-Kaldi at this stage of the process. But thank you for your experience that could help others !

Thanks for your quick response, I am excited to get my hands on Speechbrain as soon as its public.

Keep on the good work!

Should be around this year Fall :D