hip21_ocrevaluation

Resources for replication of the experiments in the HIP'21 paper "A Survey of OCR Evaluation Tools and Metrics".

How to use

TODO: Makefile and documentation

Data

The data directory contains the Ground Truth, OCR and evaluation results.

All files are named by their unique 8-digit PRImA-ID followed by one or a combination of the following extensions:

gt for Ground Truth (PAGE-XML)
gt4hist for OCR results using GT4HistOCR model (ALTO)
deu, eng, est, fin, fra, lav, nld, pol, swe for OCR results using tessdata models (ALTO)
conf for OCR confidence scores (TXT)
dinglehopper for dinglehopper CER/WER report (JSON)
ocrevalUAtion for ocrevalUAtion CER/WER/BoW report (HTML)
ocrevalCER for ocreval CER report (TXT)
ocrevalWER for ocreval WER report (TXT)
primaCER for PRImA CER report (CSV)
primaWER for PRImA WER report (CSV)
primaBoW for PRImA BoW report (CSV)

TODO: PRImA Layout evaluation results

How to cite

@inproceedings{DBLP:conf/icdar/Neudecker2021hip,
author    = {Clemens Neudecker and
             Konstantin Baierer and 
             Mike Gerber and
             Christian Clausner and
             Apostolos Antonacopoulos and
             Stefan Pletschacher},
title     = {A Survey of OCR Evaluation Tools and Metrics},
booktitle = {Proceedings of the 6th International Workshop on Historical Document Imaging and 
             Processing (HIP'21), Lausanne, Switzerland, September 6, 2021},
publisher = {{ACM}},
year      = {2021},
url       = {https://doi.org/10.1145/3476887.3476888}
}

cneud/hip21_ocrevaluation

hip21_ocrevaluation

How to use

Data

How to cite