mittagessen/kraken

Quality of kraken confidence measures

bertsky opened this issue · 1 comments

When comparing Kraken text models against similar architectures and training data on Calamari and Tesseract, I noticed that confidences returned by Kraken are overly confident. This hampers multi-OCR efforts (i.e. combining hypotheses from multiple engines/models by voting and confidence). I wonder if there is anything I can do on the training side to get better probability estimation?

CTC tends to produce bimodal confidence distributions in general and can't really be coerced to do differently (although there are a couple of papers that try to do so). Calamari and Kraken use the same implementation from cuDNN so there shouldn't be any/much difference in there apart from whatever interpolation they do in their ensembling. Tesseract's implementation is slightly different because it was written by the same person as the ocropy CTC loss which is slightly smoother than standard CTC (and they incorporate dictionary data into the confidence measure).

Anyway, in general you shouldn't treat the reported confidences of different OCR engines as being in the same metric space. Couldn't you just normalize them to get results closer in line to what you are expecting?