/DTGT

Ground truth for theological publications

Creative Commons Zero v1.0 UniversalCC0-1.0

DigiTheo Ground Truth

This repository contains transcriptions for some journals which were digitized by the University Library of Tübingen in the DigiTheo project (http://idb.ub.uni-tuebingen.de/digitue/theo/).

Metadata

Language:
deu
Format:
Page-XML
Time:
1860-1872
GT Type:
data_line
License:
CC0 1.0
Transcription Guidelines:
The transcriptions were done with eScriptorium, a transcription platform developed as part of the Scripta and RESILIENCE projects (https://gitlab.com/scripta/escriptorium/).
Project:
Theologie digital
Project-URL:
http://idb.ub.uni-tuebingen.de/digitue/theo/

Sources

The volume of transcriptions:

TextLine Page TxtRegion
6599 182 494

List of transcriptions

document TxtRegion ImgRegion LineDrawRegion GraphRegion TabRegion ChartRegion SepRegion MathRegion ChemRegion MusicRegion AdRegion NoiseRegion UnkownRegion CustomRegion TextLine Page
Defensio_episcopi_Rottenburgensis 34 283 14
Die_paepstliche_Unfehlbarkeit 39 731 20
Stimmen_aus_Maria-Laach/1872 244 3340 88
Stimmen_aus_Maria-Laach/1871 73 945 25
Allgemeine_kirchliche_Zeitschrift/1860 87 1166 30
Grundtlicher_Bericht_von_den_zwo_roten_Neben-Sonnen 17 134 5

Extent

After exporting the transcriptions as PAGE XML files, those files were processed to remove empty lines:

  perl -i -ne "tr|\r||d; next if /^\s*$/;print" *.xml