/gt_structure_text_test

Creative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

gt_structure_text_test

The OCR-D Ground Truth text and structure corpus was created between 2015 -2017. In the years since 2017, this corpus has been further curated and supplemented with metadata where appropriate. The corpus includes page XML files within annotations of the text and structure include. The data is based on transcription data stored in the German Text Archive (DTA) (https://www.deutschestextarchiv.de/).

Metadata

Language:
eng, fra, deu, heb, lat
Format:
Page-XML
Time:
1500-1900
GT Type:
data_structure_and_text
License:
CC-BY-SA-4.0
Transcription Guidelines:
OCR-D Ground Truth Guidelines https://ocr-d.de/en/gt-guidelines/trans/
Project:
OCR-D
Project-URL:
https://ocr-d.de/

Sources

The volume of transcriptions:

TextLine Page TxtRegion GraphRegion
101 4 20 3

List of transcriptions

document TxtRegion ImgRegion LineDrawRegion GraphRegion TabRegion ChartRegion SepRegion MathRegion ChemRegion MusicRegion AdRegion NoiseRegion UnknownRegion CustomRegion TextLine Page
aepinus_bekentnis_1548 20 3 101 4

Extent

In this section they can insert additional information, instructions or notes.