Bad bin file using sphinx_lm_convert from dmp file
daimoc opened this issue · 1 comments
Hi,
we found a bug in bin file generated by sphinx_lm_convert.
We used the last git version of sphinx_lm_convert (commit 2643838).
The file was genereated by this command :
sphinx_lm_convert -i fr.lm.dump -o fr.lm.bin
The generated bin file prodiuce an exception in sphynx4 DemoRunner
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 61121
at edu.cmu.sphinx.linguist.language.ngram.trie.BinaryLoader.readWords(BinaryLoader.java:151)
at edu.cmu.sphinx.linguist.language.ngram.trie.NgramTrieModel.allocate(NgramTrieModel.java:233)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:334)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:243)
at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:103)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:164)
at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:52)
at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:39)
at edu.cmu.sphinx.demo.transcriber.TranscriberDemo.main(TranscriberDemo.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at edu.cmu.sphinx.demo.DemoRunner.main(DemoRunner.java:44)
Comparaing working and crashing bin for fr.small.ml,
we found only one bit of difference at the start of file.
For working bin file the 33th byte is 0x01 and for generated crashing file it's 0x00.
After editing crashing file and fixing this byte, bin file work fine with shpinx.
So, we think there is some issue in ngram_model_trie_write_bin function.
Thanks.
This bug was fixed in sphinxbase, thank you for the report!