alumae/gst-kaldi-nnet2-online

Questions regarding nnet2 decoding on gstreamer server

feddyfedfed opened this issue · 1 comments

If the nnet3 decoding implementation is based on online2bin/online2-wav-nnet3-latgen-faster.cc, is it also the case that the nnet2 implementation is based on online2-wav-nnet2-latgen-faster? How does the server performance using nnet2 compare with nnet-latgen-faster?

I've been evaluating our models using the server and online2-wav-nnet2-latgen-faster but I found that I can't get almost similar performance between the two. My basis for comparison are utterances that are perfectly decoded by competing systems, perfectly decoded by online2-wav-nnet2-latgen-faster, but consistently erroneous with the server. I understand that there will be discrepancies and variations in performance as caused by dithering, but is it also possible that the server is doing something to the speech data that is different when we do offline decoding?

Yes, nnet2 implementation is based on online2-wav-nnet2-latgen-faster and they should provide similar performance. If there are large differences, it could be because of some kind of a bug. If you could provide a model and the files, and show the differences, we could try to debug it.