/core

Collection of OCR-related python tools and wrappers from @OCR-D

Primary LanguagePythonApache License 2.0Apache-2.0

OCR-D/core

Python modules implementing OCR-D specs and related tools

image image image image Docker Automated build image image

Gitter chat

Introduction

This repository contains the python packages that form the base for tools within the OCR-D ecosphere.

All packages are also published to PyPI.

The easiest way to install is via pip:

pip install ocrd

# or just the functionality you need, e.g.

pip install ocrd_modelfactory

All python software released by OCR-D requires Python 3.5 or higher.

Packages

ocrd_utils

Contains utilities and constants, e.g. for logging, path normalization, coordinate calculation etc.

See README for ocrd_utils for further information.

ocrd_models

Contains file format wrappers for PAGE-XML, METS, EXIF metadata etc.

See README for ocrd_models for further information.

ocrd_modelfactory

Code to instantiate models from existing data.

See README for ocrd_modelfactory for further information.

ocrd_validators

Schemas and routines for validating BagIt, ocrd-tool.json, workspaces, METS, page, CLI parameters etc.

See README for ocrd_validators for further information.

ocrd

Depends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.

Also contains the command line tool ocrd.

See README for ocrd for further information.

Testing

Download assets (make assets)

Test with local files: make test

  • Test with local asset server:

    • Start asset-server: make asset-server
    • make test OCRD_BASEURL='http://localhost:5001/'
  • Test with remote assets:

    • make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'

See Also