/hip21_ocrevaluation

A Survey of OCR Evaluation Tools and Metrics (HIP'21)

Primary LanguageHTML

hip21_ocrevaluation

Resources for replication of the experiments in the HIP'21 paper "A Survey of OCR Evaluation Tools and Metrics".

How to use

TODO: Makefile and documentation

Data

The data directory contains the Ground Truth, OCR and evaluation results.

All files are named by their unique 8-digit PRImA-ID followed by one or a combination of the following extensions:

  • gt for Ground Truth (PAGE-XML)
  • gt4hist for OCR results using GT4HistOCR model (ALTO)
  • deu, eng, est, fin, fra, lav, nld, pol, swe for OCR results using tessdata models (ALTO)
  • conf for OCR confidence scores (TXT)
  • dinglehopper for dinglehopper CER/WER report (JSON)
  • ocrevalUAtion for ocrevalUAtion CER/WER/BoW report (HTML)
  • ocrevalCER for ocreval CER report (TXT)
  • ocrevalWER for ocreval WER report (TXT)
  • primaCER for PRImA CER report (CSV)
  • primaWER for PRImA WER report (CSV)
  • primaBoW for PRImA BoW report (CSV)

TODO: PRImA Layout evaluation results

How to cite

@inproceedings{DBLP:conf/icdar/Neudecker2021hip,
author    = {Clemens Neudecker and
             Konstantin Baierer and 
             Mike Gerber and
             Christian Clausner and
             Apostolos Antonacopoulos and
             Stefan Pletschacher},
title     = {A Survey of OCR Evaluation Tools and Metrics},
booktitle = {Proceedings of the 6th International Workshop on Historical Document Imaging and 
             Processing (HIP'21), Lausanne, Switzerland, September 6, 2021},
publisher = {{ACM}},
year      = {2021},
url       = {https://doi.org/10.1145/3476887.3476888}
}