/Weisthuemer

Ground truth for Jakob Grimm / Weisthümer

Creative Commons Zero v1.0 UniversalCC0-1.0

Weisthümer

This repository contains transcriptions of 35 pages from Jacob Grimm's seven-volume work "Weisthümer", which can be used for training or validation of OCR models.

Typeface class:

Antiqua

Languages:

different variants of Middle High German, Latin

Special characters:

Roman numerals, exponents, section break (§), long s (ſ), circumflex (â), caron (ǎ), acute accent (á), ring diacritic (å), diacritic umlauts (aͤ), cursive Greek letters Theta (ϑ), Beta (β), Pi (Π).

Sources:

The transcriptions refer to digitised material available on archive.org:
Volume 1: https://archive.org/details/bub_gb_2J0ZKYG7on8C
Volume 2: https://archive.org/details/bub_gb_LFpLZSYYg34C
Volume 3: https://archive.org/details/bub_gb_o6S3yrj9TkwC
Volume 4: https://archive.org/details/bub_gb_eAqsmQrcWcQC
Volume 5: https://archive.org/details/bub_gb_MMcFAAAAQAAJ
Volume 6: https://archive.org/details/weisthmer02drongoog
Volume 7: https://archive.org/details/weisthmer09maurgoog

Further details on the transcription and training workflow can be found in the Wiki