OCR-D/core
Python modules implementing OCR-D specs and related tools
Introduction
This repository contains the python packages that form the base for tools within the OCR-D ecosphere.
All packages are also published to PyPI.
The easiest way to install is via pip
:
pip install ocrd
# or just the functionality you need, e.g.
pip install ocrd_modelfactory
All python software released by OCR-D requires Python 3.5 or higher.
Packages
ocrd_utils
Contains utilities and constants, e.g. for logging, path normalization, coordinate calculation etc.
See README for ocrd_utils
for further information.
ocrd_models
Contains file format wrappers for PAGE-XML, METS, EXIF metadata etc.
See README for ocrd_models
for further information.
ocrd_modelfactory
Code to instantiate models from existing data.
See README for ocrd_modelfactory
for further information.
ocrd_validators
Schemas and routines for validating BagIt, ocrd-tool.json
, workspaces, METS, page, CLI parameters etc.
See README for ocrd_validators
for further information.
ocrd
Depends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.
Also contains the command line tool ocrd
.
See README for ocrd
for further information.
Testing
Download assets (make assets
)
Test with local files: make test
-
Test with local asset server:
- Start asset-server:
make asset-server
make test OCRD_BASEURL='http://localhost:5001/'
- Start asset-server:
-
Test with remote assets:
make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'