Info: current statistics
wollmers opened this issue · 0 comments
wollmers commented
Compared original XML "ONB_newseye" to current line texts "AustrianNewspapers".
compare_xml.pl Version 0.01
Compare XML text output against ground truth (GRT):
XML: ONB_newseye
GRT: AustrianNewspapers
Summary:
lines words chars
items ocr: 57541 326524 2198240 matches + inserts + substitutions
items grt: 57541 326394 2198051 matches + deletions + substitutions
matches: 23961 265356 2125325 matches
edits: 33580 61346 73806 inserts + deletions + substitutions
subss: 33580 60860 71835 substitutions
inserts: 0 308 1080 inserts
deletions: 0 178 891 deletions
precision: 0.4164 0.8127 0.9668 matches / (matches + substitutions + inserts)
recall: 0.4164 0.8130 0.9669 matches / (matches + substitutions + deletions)
accuracy: 0.4164 0.8122 0.9664 matches / (matches + substitutions + inserts + deletions)
f-score: 0.4164 0.8128 0.9669 ( 2 * recall * precision ) / (recall + precision )
Shortened list of the edits/mismatches:
Character match (confusion) table:
GRT => OCR ratio errors count
--- --- ------ ------- -------
'ſ' => 's' 0.9985 56885 56971
'⸗' => '-' 0.0052 61 11639
'⸗' => '=' 0.3232 3762 11639
'⸗' => '¬' 0.6691 7788 11639
-----
SUM 68496
+ transcription 1000 estimated transcription level 1 -> 2
-----
TOTAL transcription 69496
edits 73806
- transcription -69496
-----
corrections 4310 (0,20% of all characters)
Rough guess of errors still in the GRT: 1000 - 2000.