/alto-tools

Python script for performing various operations on ALTO XML files

Primary LanguagePython

alto-tools

image

Warning: not fully implemented - work in progress

Python3 script for performing various operations on ALTO files.

Planned features:

  • extract OCR confidence of the ALTO document(s)
  • extract text content of the ALTO document(s)
  • extract graphical elements of the ALTO document(s)
  • extract metadata of the ALTO document(s)
  • xsl transform ALTO document(s) to target format(s)
  • xpath query content of the ALTO document(s)

Requirements:

  • lxml for XPath and XSLT support