OCRErrorStatistics

extraction and counting of OCR errors

following the error classes:

0.10: latin (ignore)

1.00: correct

0.99: self correction (~correct)

0.70: self correction (correct)

0.98: error (accent)

0.97: error (?)

0.96: error (impossible syllabic sequence)

0.95: error (single bad chars)

0.94: error (random chars)

0.90 - 0.71 (del - suggestions): ignore

error rate was counted by error/nonignored. NaN implies that the whole document was ignored and a division 0.00 took place.

mariamoritz/OCRErrorStatistics