sphinx_lm_convert raises exception from missing backoff weight in unigram

Question

sphinx_lm_convert raises exception from missing backoff weight in unigram

Halle opened this issue 9 years ago · 3 comments

Hi all,

Nickolay's previous posted issue reminded me that I noticed last week (and then worked around and forgot) that current sphinx_lm_convert will raise an exception and stop if given an ARPA file with no backoff weight on a unigram (and I think also on a bigram), however, there are a couple of LM tools which generate ARPA models that do not print a backoff weight for the end of sequence token (the closing 's' tag which Github is erasing when I post it here because it thinks it is HTML), so it rejects (what I think) are valid models. As a workaround I am modifying my models like this with sed so that the end of sequence token unigrams and bigrams have a 0.0 weight before submitting them to sphinx_lm_convert, but I suppose this is something that the converter could probably handle when it encounters an end of sequence token without a backoff weight. It's naturally not choking on trigrams without a backoff weight. Thanks!

Answer 1 · 2015-07-19T09:32:21.000Z

Yeah, it was a requirement to handle properly such models, I'm really not sure why it is not supported. We'll fix it as soon as sf will be back.

Answer 2 · 2015-07-19T12:14:52.000Z

Ah, your model wasn't yet attached when I read your issue – now I see that it's the same issue, sorry for the dupe.

Answer 3 · 2015-07-19T15:03:01.000Z

Yes, basically a duplicate of #13