The 'gt_structure_all' repository is a comprehensive collection that catalogues all the individual Ground Truth Structure repositories. Collectively, these repositories make up the OCR-D Ground Truth Structure corpus. This corpus exclusively contains data in page format, capturing the structural elements (segments/regions) of printed pages. It was established as part of the DFG project OCR-D.
- https://OCR-D.github.io/gt_structure_1_1/
- https://OCR-D.github.io/gt_structure_1_2/
- https://OCR-D.github.io/gt_structure_1_3/
- https://OCR-D.github.io/gt_structure_1_4/
- https://OCR-D.github.io/gt_structure_2_1/
- https://OCR-D.github.io/gt_structure_2_2/
- https://OCR-D.github.io/gt_structure_2_3/
- https://OCR-D.github.io/gt_structure_2_4/
- https://OCR-D.github.io/gt_structure_3_1/
- https://OCR-D.github.io/gt_structure_3_2/
- https://OCR-D.github.io/gt_structure_3_3/
- https://OCR-D.github.io/gt_structure_4_1/
- https://OCR-D.github.io/gt_structure_4_2/
- https://OCR-D.github.io/gt_structure_4_3/
- https://OCR-D.github.io/gt_structure_5_1/
- https://OCR-D.github.io/gt_structure_5_2/
- https://OCR-D.github.io/gt_structure_5_3/
All data records are also listed in Zenodo. And thus also have a DOI. When changes are made and a new release is created, the data set is given a new DOI.
Access to the OCR-D datasets in Zenodo: https://zenodo.org/communities/ocr-d/records?q=&f=subject%3Aground-truth&l=list&p=1&s=10&sort=newest
If you wish to incorporate text data into these structural datasets, please refer to the overview repository available at the following link: https://github.com/deutschestextarchiv/gt_structure_dtaText