sphinx_lm_convert raises exception from missing backoff weight in unigram
Halle opened this issue · 3 comments
Hi all,
Nickolay's previous posted issue reminded me that I noticed last week (and then worked around and forgot) that current sphinx_lm_convert will raise an exception and stop if given an ARPA file with no backoff weight on a unigram (and I think also on a bigram), however, there are a couple of LM tools which generate ARPA models that do not print a backoff weight for the end of sequence token (the closing 's' tag which Github is erasing when I post it here because it thinks it is HTML), so it rejects (what I think) are valid models. As a workaround I am modifying my models like this with sed so that the end of sequence token unigrams and bigrams have a 0.0 weight before submitting them to sphinx_lm_convert, but I suppose this is something that the converter could probably handle when it encounters an end of sequence token without a backoff weight. It's naturally not choking on trigrams without a backoff weight. Thanks!
Yeah, it was a requirement to handle properly such models, I'm really not sure why it is not supported. We'll fix it as soon as sf will be back.
Ah, your model wasn't yet attached when I read your issue – now I see that it's the same issue, sorry for the dupe.