Calamari-OCR/calamari

hidden error&Average sentence confidence&confidence voting

Tailor2019 opened this issue · 3 comments

Hello!
@andbue @ChWick

After using the finetuning of Calamari with a pretrained model in the result there is the
-hidden error
-Average sentence confidence
-confidence voting
Please I don't understand their significance and what expression used to calculate them.
Please can you show me their significance and the expression to calculate them.

Thanks a lot for your continued aid

-hidden error

for i, ((gt, pred), count) in enumerate(keys):
gt_fmt = "{" + gt + "}"
pred_fmt = "{" + pred + "}"
if i == n_confusions:
break
percent = count * max(len(gt), len(pred)) / r["total_sync_errs"]
print("{:8s} {:8s} {:8d} {:10.2%}".format(gt_fmt, pred_fmt, count, percent))
total_percent += percent
print("The remaining but hidden errors make up {:.2%}".format(1.0 - total_percent))

("Hidden" are the ones that are not listed in the table)

-Average sentence confidence

avg_sentence_confidence += prediction.avg_char_probability

(An average over all the confidences for all the lines, where the confidence of a line is the average of the confidence for each char in the line)

-confidence voting

https://arxiv.org/abs/1711.09670

Thanks!
@andbue
Is the hidden error=percentage of the number of characters that Calamari commit an error ?

No. The percentage there is just to give you an idea about the amount of lines that are not listed in the table. If it's low, then you've got most of your errors already in the table (i.e. some kinds of errors are frequent). If it's high, the errors are similar in frequency.