Test data for testing specs and software in @OCR-D
- SBB0000F29300010000: Pages 1-5 of http://resolver.staatsbibliothek-berlin.de/SBB0000F29300010000
- kant_aufklaerung_1784: http://ocr-d.de/sites/all/GTDaten/kant_aufklaerung_1784.zip, with TIFF compressed with JPEG + METS for second page
- kant_aufklaerung_1784-binarized: http://ocr-d.de/sites/all/GTDaten/kant_aufklaerung_1784.zip, with binarized/gray produced by ocropus-nlbin + METS for all
- kant_aufklaerung_1784-complex: Result of running https://github.com/bertsky/workflow-configuration/blob/master/crop-anyocr-binarize-page-olena-sauvola-denoise-ocropy-deskew-page-ocropy-segment-tesseract-ocropy-dewarp-ocr-ocropy-tesseract.mk on kant_aufklaerung_1784
- kant_aufklaerung_1784-page-block-line-word_glyph: Sample Page file with region, word and glyphs.
- test.ocrd.zip: OCRD-ZIP of
kant_aufklaerung_1784
. - param-binarize.json: Sample parameter JSON file
- sample_bagit-with-fetch: OCRD-ZIP of
PPN595930174
(simplified to file group GDZOCR and PRESENTATION). - dfki-testdata: Test assets from https://github.com/syedsaqibbukhari/docanalysis
- pembroke_werke_1766: Page 10 of http://resolver.staatsbibliothek-berlin.de/SBB0001CA7900000000
- column-samples: Samples for column detection
- DIBCO11-machine_printed: Test set for the DIBCO11 challenge
- page_dewarp: Dewarping samples by @mzucker
- leptonica_samples: Sample facsimile from the leptonica computer vision library