pagexml

There are 16 repositories under pagexml topic.

  • mauvilsa/tesseract-recognize

    Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format

    Language:C++46478
  • mauvilsa/nw-page-editor

    Simple app for visual editing of Page XML files

    Language:JavaScript313119
  • andbue/nashi

    Some bits of javascript to transcribe scanned pages using PageXML

    Language:HTML17934
  • TEI4HTR/page2tei

    A repository for illustrating the transformation of a PAGE XML file into XML-TEI format, resulting from experimentations made for the LECTAUREP project.

    Language:XSLT161122
  • omni-us/pagexml

    Library in C++ and a python wrapper for dealing with Page XML files

    Language:C++13622
  • OCR-D/gt-repo-template

    A template for creating a ground truth repo with the various functions and features: such as metadata creation, data analysis and presentation.

  • cconzen/ReadingOrderRecalculation

    Post-process PageXMLs to improve their region reading order

    Language:Python5102
  • lectaurep/lepidemo

    LECTAUREP Pipeline demonstration to TEI Publisher

    Language:Jupyter Notebook4022
  • BobLd/PublayNetSharp

    Extract and convert PubLayNet data to PageXml format

    Language:C#220
  • tboenig/gt_corpus_benchmark

    This repo provides a collection of ground truth data. The collection was compiled under different aspects (complexity of the layouts and use of the fonts). The individual data are also characterized by metadata. The metadata is based on the labeling scheme of OCR-D/PrimaLab.

  • HTR-School-Vienna/2024--late-medieval-latin

    Transcriptions of 15th-century Latin manuscripts (ÖNB Cod. 4680 and 4135) from the 2024/2025 HTR Winter School, following CATMuS guidelines.

  • jahtz/octopy

    Command line tool for Kraken text segmentation and recognition.

    Language:Python1150
  • jahtz/pypxml

    A python library for parsing, converting and modifying PageXML files.

    Language:Python10
  • jahtz/tesspage

    Toolset for Tesseract training with PageXML Ground-Truth

    Language:Python1100
  • Middle-High-German-Conceptual-Database/xquery-pagexml-transkribus-module

    This module provides access to Transkribus PageXML files via Xquery functions. It is designed to be used in context of a Basex xml database, but should work with other xml databases as well.

    Language:XQuery1200
  • SCDH/x2tei-transformations

    Transformation from various Formats to TEI

    Language:XSLT