alumae/gst-kaldi-nnet2-online

Problem with NNet3 model

FredPraca opened this issue · 4 comments

Hello guys,
I'm trying to use the plugin with the following chain:

gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! audioconvert ! audioresample ! tee name=t ! queue ! kaldinnet2onlinedecoder use-threaded-decoder=0 nnet-mode=3 model=/opt/models/fr/final.mdl word-syms=/opt/models/fr/words.txt fst=/opt/models/fr/HCLG.fst mfcc-config=/opt/models/fr/mfcc_hires.conf ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf phone-syms=/opt/models/fr/phones.txt frame-subsampling-factor=3 max-active=7000 beam=13.0 lattice-beam=8.0 acoustic-scale=1 do-endpointing=1 endpoint-silence-phones=\"1:2:3:4:5:16:17:18:19:20\" traceback-period-in-secs=0.25 num-nbest=2 chunk-length-in-secs=0.25 ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0

The problem is that I get the following assert:

ASSERTION_FAILED ([5.2]:AdvanceChunk():decodable-online-looped.cc:223) : 'current_log_post_.NumRows() == info_.frames_per_chunk / info_.opts.frame_subsampling_factor && current_log_post_.NumCols() == info_.output_dim'

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
kaldi::nnet3::DecodableNnetLoopedOnlineBase::AdvanceChunk()
kaldi::nnet3::DecodableNnetLoopedOnlineBase::EnsureFrameIsComputed(int)
kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int)
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int)
kaldi::SingleUtteranceNnet3Decoder::AdvanceDecoding()

The curious thing is that it works when used with the Kaldi Gstreamer Server.
I can avoid this assert by removing the frame-subsampling-factor but in this case, it becomes really long and I get a warning about the lattice beam.

Any idea ?

For the sake of completeness, I also should say that it fails the same way with the gui-demo code.

Good news, I made it work but the fix is not that good.
In fact, the problem is in the order of the parameters used.
When putting fst and model in last positions, it works.

gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! \
               audioconvert ! \
               audioresample ! tee name=t ! queue ! \
	       kaldinnet2onlinedecoder \
	       use-threaded-decoder=0 \
	       nnet-mode=3 \
	       word-syms=/opt/models/fr/words.txt \
	       mfcc-config=/opt/models/fr/mfcc_hires.conf \
	       ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf \
	       phone-syms=/opt/models/fr/phones.txt \
	       frame-subsampling-factor=3 \
	       max-active=7000 \
	       beam=13.0 \
	       lattice-beam=8.0 \
	       acoustic-scale=1 \
	       do-endpointing=1 \
	       endpoint-silence-phones=1:2:3:4:5:16:17:18:19:20 \
	       traceback-period-in-secs=0.25 \
	       num-nbest=2 \
	       chunk-length-in-secs=0.25 \
	       fst=/opt/models/fr/HCLG.fst \
	       model=/opt/models/fr/final.mdl \
	       ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0

I think it's anyway still a problem.

Thank you for posting your workaround.

The problem is that the fix only closes the server version but we use it as GStreamer plugin directly.
So, I would say that this issue on the plugin is not closed.