OCR-D/assets
Test data for testing specs and software in @OCR-D
-
SBB0000F29300010000: Pages 1-5 of http://resolver.staatsbibliothek-berlin.de/SBB0000F29300010000
-
kant_aufklaerung_1784: http://ocr-d.de/sites/all/GTDaten/kant_aufklaerung_1784.zip, with TIFF compressed with JPEG + METS for second page
-
kant_aufklaerung_1784-binarized: http://ocr-d.de/sites/all/GTDaten/kant_aufklaerung_1784.zip, with binarized/gray produced by ocropus-nlbin + METS for all
-
kant_aufklaerung_1784-page-block-line-word_glyph: Sample Page file with region, word and glyphs.
-
test.ocrd.zip: OCRD-ZIP of
kant_aufklaerung_1784
. -
param-binarize.json: Sample parameter JSON file
-
sample_bagit-with-fetch: OCRD-ZIP of
PPN595930174
(simplified to file group GDZOCR and PRESENTATION). -
dfki-testdata: Test assets from https://github.com/syedsaqibbukhari/docanalysis
-
pembroke_werke_1766: Page 10 of http://resolver.staatsbibliothek-berlin.de/SBB0001CA7900000000
-
dietrich_fuehrer_1839: Full METS for http://digital.slub-dresden.de/id284175080
-
column-samples: Samples for column detection
-
DIBCO11-machine_printed: Test set for the DIBCO11 challenge
-
page_dewarp: Dewarping samples by @mzucker
-
leptonica_samples: Sample facsimile from the leptonica computer vision library
Page Schema
for more information and the latest schema you can find here: https://github.com/PRImA-Research-Lab/PAGE-XML/wiki
- schema/2009-03-16.xsd: PAGE XSD, version 2009-03-16
- schema/2010-01-12.xsd: PAGE XSD, version 2010-01-12
- schema/2010-03-19.xsd: PAGE XSD, version 2010-03-19
- schema/2013-07-15.xsd: PAGE XSD, version 2013-07-15
- schema/2016-07-15.xsd: PAGE XSD, version 2016-07-15
- schema/2017-07-15.xsd: PAGE XSD, version 2017-07-15