/ocrd_page2tei

OCR-D wrapper for page2tei

Primary LanguageMakefileMIT LicenseMIT

ocrd_page2tei

OCR-D wrapper for [page2tei](https://github.com/tboenig/page2tei)

Introduction

This offers an OCR-D compliant workspace processor for TEI conversion.

It wraps the XSL transformation page2tei for OCR-D:

  • For XSL processing, it uses Saxon.

  • For handling METS/PAGE, and providing the OCR-D CLI, it is written as a shell script, and relies heavily on the OCR-D core bashlib API.

Installation

Requires Java>=8, Saxon and GNU make.

To install system dependencies on Ubuntu, do

sudo make deps-ubuntu

Which is the equivalent of:

apt install openjdk-8-jre-headless

To install local dependencies (download Saxon and page2tei), do

make deps

To install this module, then do:

make install

Usage

OCR-D processor interface ocrd-page2tei

To be used with PAGE-XML documents in an OCR-D annotation workflow.

Usage: ocrd-page2tei [OPTIONS]

Convert PAGE-XML to TEI-C

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Testing

none yet