GT-FRAKTUR

gt-fraktur is the Ground Truth (GT) data for Fraktur/Gothic prints from the 19th Century, released by UB, Uni-Tübingen as Open Data under the CC0 public license.


§1. GT Data

This repository contains transcriptions of selected pages from 19th Century books as listed below. The original TIFF images used for OCR transcription of the following publications are published on Archive.org under the CC0 public license.

§1.1. Shelfmark

The Shelfmark / DigitalID's of the 19th Century Fraktur prints selected for transcribing:

# FolderName NumberOfPages URL-Shelfmark-DigitalID Comments
01. agtck_1834_02 15 pgs http://idb.ub.uni-tuebingen.de/opendigi/agtck_1834_02
02. akzs_1860 24 pgs http://idb.ub.uni-tuebingen.de/opendigi/akzs_1860
03. artl_001 20 pgs http://idb.ub.uni-tuebingen.de/opendigi/artl_001
04. artl_002 18 pgs http://idb.ub.uni-tuebingen.de/opendigi/artl_002 Error in 1 image.
05. drey1834 5 pgs http://idb.ub.uni-tuebingen.de/opendigi/drey1834
06. harless1834 7 pgs http://idb.ub.uni-tuebingen.de/opendigi/harless1834
07. kath_1830_035 18 pgs http://idb.ub.uni-tuebingen.de/opendigi/kath_1830_035
08. litrdsch_1875 38 pgs http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1875 Errors in 2 images.
09. stml_1871_01 22 pgs http://idb.ub.uni-tuebingen.de/opendigi/stml_1871_01
10. thlblb_1866 25 pgs http://idb.ub.uni-tuebingen.de/opendigi/thlblb_1866 Errors in 3 images.
11. zpkt_1832_01 8 pgs http://idb.ub.uni-tuebingen.de/opendigi/zpkt_1832_01
12. zpk_1838_01 7 pgs http://idb.ub.uni-tuebingen.de/opendigi/zpk_1838_01

§1.2. Quality Issues

Details of the page quality issues observed during the transcription process:

# Shelfmark-DigitalID Quality Bugs
1. artl_002 artl_002_00010.tif has bad alignment
2. litrdsch_1875 Misprint
3. litrdsch_1875 Misprint: litrdsch_1875_0146.tif (page 28); line 6-38 in the left column
4. thlblb_1866 Image "thlblb_1866_00037.tif", has a crossed 'o' (eg. ø, Unicode: U+00F8) in the word "Redaction" in multiple places on the page, which were manually corrected to a regular "o" during transcription.
5. thlblb_1866 thlblb_1866_00121.tif, right column - it seems like the long ſ was corrected manually
6. thlblb_1866 thlblb_1866_00425.tif, left column – the word "fünfte" is somehow blurred - seems like there are two "f".

§2. LICENSE

  • This data is is released by UB, Uni-Tuebingen as Open Data under the CC0 public license.