page-xml
There are 23 repositories under page-xml topic.
UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
mittagessen/kraken
OCR engine for all the languages
BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
lquirosd/P2PaLA
Page to PAGE Layout Analysis Tool
UB-Mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
cneud/ocr-conversion
Conversions between various OCR formats
qurator-spk/dinglehopper
An OCR evaluation tool
kba/transkribus-to-prima
Convert Transkribus PAGE-XML to standard PAGE-XML
UB-Mannheim/blatt
NLP-helper for OCR-ed pages in PAGE XML format
slub/textract2page
Convert AWS Textract JSON to PRImA PAGE XML
VRI-UFPR/page-xml-draw
A powerful CLI tool for visualization and encoding of PAGE-XML files
Heresta/OCR17plus
Data for layout analysis and HTR.
IMAGO-Catalogues-Jjanes/cataloguesSegmentationOCR
Dataset and models for catalogs' Layout analysis and HTR
qurator-spk/ocrd_repair_inconsistencies
Automatically re-order lines, words and glyphs to become textually consistent with their parents.
OCR-D/gt_structure_1_1
The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
OCR-D/gt_structure_1_4
About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
tboenig/gt-guidelines
OCR-D guidelines for Ground Truth production
OCR-D/gt-repo-scripts
XSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).
OCR-D/gt_structure_1_2
The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
OCR-D/gt_structure_1_3
The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
tboenig/German-Brazilian-Newspapers-Dataset_1
The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.
tboenig/German-Brazilian-Newspapers-Dataset_2
The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.
VRI-UFPR/ocrd-page-xml-draw
OCR-D wrapper for page-xml-draw