Problem with NNet3 model

Question

Problem with NNet3 model

FredPraca opened this issue 7 years ago · 4 comments

Hello guys,
I'm trying to use the plugin with the following chain:

gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! audioconvert ! audioresample ! tee name=t ! queue ! kaldinnet2onlinedecoder use-threaded-decoder=0 nnet-mode=3 model=/opt/models/fr/final.mdl word-syms=/opt/models/fr/words.txt fst=/opt/models/fr/HCLG.fst mfcc-config=/opt/models/fr/mfcc_hires.conf ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf phone-syms=/opt/models/fr/phones.txt frame-subsampling-factor=3 max-active=7000 beam=13.0 lattice-beam=8.0 acoustic-scale=1 do-endpointing=1 endpoint-silence-phones=\"1:2:3:4:5:16:17:18:19:20\" traceback-period-in-secs=0.25 num-nbest=2 chunk-length-in-secs=0.25 ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0

The problem is that I get the following assert:

ASSERTION_FAILED ([5.2]:AdvanceChunk():decodable-online-looped.cc:223) : 'current_log_post_.NumRows() == info_.frames_per_chunk / info_.opts.frame_subsampling_factor && current_log_post_.NumCols() == info_.output_dim'

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
kaldi::nnet3::DecodableNnetLoopedOnlineBase::AdvanceChunk()
kaldi::nnet3::DecodableNnetLoopedOnlineBase::EnsureFrameIsComputed(int)
kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int)
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int)
kaldi::SingleUtteranceNnet3Decoder::AdvanceDecoding()

The curious thing is that it works when used with the Kaldi Gstreamer Server.
I can avoid this assert by removing the frame-subsampling-factor but in this case, it becomes really long and I get a warning about the lattice beam.

Any idea ?

Answer 1 · 2018-01-22T12:44:27.000Z

For the sake of completeness, I also should say that it fails the same way with the gui-demo code.

Answer 2 · 2018-01-23T16:14:55.000Z

Good news, I made it work but the fix is not that good.
In fact, the problem is in the order of the parameters used.
When putting fst and model in last positions, it works.

gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! \
               audioconvert ! \
               audioresample ! tee name=t ! queue ! \
	       kaldinnet2onlinedecoder \
	       use-threaded-decoder=0 \
	       nnet-mode=3 \
	       word-syms=/opt/models/fr/words.txt \
	       mfcc-config=/opt/models/fr/mfcc_hires.conf \
	       ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf \
	       phone-syms=/opt/models/fr/phones.txt \
	       frame-subsampling-factor=3 \
	       max-active=7000 \
	       beam=13.0 \
	       lattice-beam=8.0 \
	       acoustic-scale=1 \
	       do-endpointing=1 \
	       endpoint-silence-phones=1:2:3:4:5:16:17:18:19:20 \
	       traceback-period-in-secs=0.25 \
	       num-nbest=2 \
	       chunk-length-in-secs=0.25 \
	       fst=/opt/models/fr/HCLG.fst \
	       model=/opt/models/fr/final.mdl \
	       ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0

I think it's anyway still a problem.

Answer 3 · 2018-02-16T21:07:41.000Z

Thank you for posting your workaround.

Answer 4 · 2018-02-19T10:20:03.000Z

The problem is that the fix only closes the server version but we use it as GStreamer plugin directly.
So, I would say that this issue on the plugin is not closed.