gt_structure_all

The 'gt_structure_all' repository is a comprehensive collection that catalogues all the individual Ground Truth Structure repositories. Collectively, these repositories make up the OCR-D Ground Truth Structure corpus. This corpus exclusively contains data in page format, capturing the structural elements (segments/regions) of printed pages. It was established as part of the DFG project OCR-D.

Data-Repositories


zenodo logo

All data records are also listed in Zenodo. And thus also have a DOI. When changes are made and a new release is created, the data set is given a new DOI.

Access to the OCR-D datasets in Zenodo: https://zenodo.org/communities/ocr-d/records?q=&f=subject%3Aground-truth&l=list&p=1&s=10&sort=newest

Text Data

If you wish to incorporate text data into these structural datasets, please refer to the overview repository available at the following link: https://github.com/deutschestextarchiv/gt_structure_dtaText