hidden error&Average sentence confidence&confidence voting

Hello!
@andbue @ChWick

After using the finetuning of Calamari with a pretrained model in the result there is the
-hidden error
-Average sentence confidence
-confidence voting
Please I don't understand their significance and what expression used to calculate them.
Please can you show me their significance and the expression to calculate them.

Thanks a lot for your continued aid

-hidden error

calamari/calamari_ocr/scripts/eval.py

Lines 24 to 34 in 15afa29

    
           for i, ((gt, pred), count) in enumerate(keys): 
        
               gt_fmt = "{" + gt + "}" 
        
               pred_fmt = "{" + pred + "}" 
        
               if i == n_confusions: 
        
                   break 
        
               percent = count * max(len(gt), len(pred)) / r["total_sync_errs"] 
        
               print("{:8s} {:8s} {:8d} {:10.2%}".format(gt_fmt, pred_fmt, count, percent)) 
        
               total_percent += percent 
        
           print("The remaining but hidden errors make up {:.2%}".format(1.0 - total_percent))

("Hidden" are the ones that are not listed in the table)

-Average sentence confidence

calamari/calamari_ocr/scripts/predict.py

Line 140 in f1cdbb4

avg_sentence_confidence += prediction.avg_char_probability

(An average over all the confidences for all the lines, where the confidence of a line is the average of the confidence for each char in the line)

-confidence voting

https://arxiv.org/abs/1711.09670

Thanks!
@andbue
Is the hidden error=percentage of the number of characters that Calamari commit an error ?

No. The percentage there is just to give you an idea about the amount of lines that are not listed in the table. If it's low, then you've got most of your errors already in the table (i.e. some kinds of errors are frequent). If it's high, the errors are similar in frequency.

	for i, ((gt, pred), count) in enumerate(keys):
	gt_fmt = "{" + gt + "}"
	pred_fmt = "{" + pred + "}"
	if i == n_confusions:
	break

	percent = count * max(len(gt), len(pred)) / r["total_sync_errs"]
	print("{:8s} {:8s} {:8d} {:10.2%}".format(gt_fmt, pred_fmt, count, percent))
	total_percent += percent

	print("The remaining but hidden errors make up {:.2%}".format(1.0 - total_percent))