Resources for replication of the experiments in the HIP'21 paper "A Survey of OCR Evaluation Tools and Metrics".
TODO: Makefile and documentation
The data
directory contains the Ground Truth, OCR and evaluation results.
All files are named by their unique 8-digit PRImA-ID followed by one or a combination of the following extensions:
gt
for Ground Truth (PAGE-XML)gt4hist
for OCR results using GT4HistOCR model (ALTO)deu
,eng
,est
,fin
,fra
,lav
,nld
,pol
,swe
for OCR results using tessdata models (ALTO)conf
for OCR confidence scores (TXT)dinglehopper
for dinglehopper CER/WER report (JSON)ocrevalUAtion
for ocrevalUAtion CER/WER/BoW report (HTML)ocrevalCER
for ocreval CER report (TXT)ocrevalWER
for ocreval WER report (TXT)primaCER
for PRImA CER report (CSV)primaWER
for PRImA WER report (CSV)primaBoW
for PRImA BoW report (CSV)
TODO: PRImA Layout evaluation results
@inproceedings{DBLP:conf/icdar/Neudecker2021hip,
author = {Clemens Neudecker and
Konstantin Baierer and
Mike Gerber and
Christian Clausner and
Apostolos Antonacopoulos and
Stefan Pletschacher},
title = {A Survey of OCR Evaluation Tools and Metrics},
booktitle = {Proceedings of the 6th International Workshop on Historical Document Imaging and
Processing (HIP'21), Lausanne, Switzerland, September 6, 2021},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3476887.3476888}
}