page-xml

There are 23 repositories under page-xml topic.

UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
Language:C#2.2k 48 556284
mittagessen/kraken
OCR engine for all the languages
Language:Python881 30 561149
BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
Language:C#624 34 168
UB-Mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Language:JavaScript196 19 8824
lquirosd/P2PaLA
Page to PAGE Layout Analysis Tool
Language:Python191 13 3742
cneud/ocr-conversion
Conversions between various OCR formats
80 5 33
qurator-spk/dinglehopper
An OCR evaluation tool
Language:Python66 5 9016
kba/transkribus-to-prima
Convert Transkribus PAGE-XML to standard PAGE-XML
Language:Python12 9 153
UB-Mannheim/blatt
NLP-helper for OCR-ed pages in PAGE XML format
Language:Python10 3 01
slub/textract2page
Convert AWS Textract JSON to PRImA PAGE XML
Language:Python6 4 103
VRI-UFPR/page-xml-draw
A powerful CLI tool for visualization and encoding of PAGE-XML files
Language:Python6 4 122
Heresta/OCR17plus
Data for layout analysis and HTR.
Language:Python4 1 33
IMAGO-Catalogues-Jjanes/cataloguesSegmentationOCR
Dataset and models for catalogs' Layout analysis and HTR
Language:Python2 1 01
qurator-spk/ocrd_repair_inconsistencies
Automatically re-order lines, words and glyphs to become textually consistent with their parents.
Language:Python2 3 63
tboenig/gt-guidelines
OCR-D guidelines for Ground Truth production
Language:XSLT2 0 01
OCR-D/gt_structure_1_1
The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
1 2 01
OCR-D/gt_structure_1_4
About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
1 2 01
Lemmbraalemao-DPB/German-Brazilian-Newspapers-Dataset_1
The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.
Lemmbraalemao-DPB/German-Brazilian-Newspapers-Dataset_2
The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.
OCR-D/gt-repo-scripts
XSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).
Language:XSLT2 12
OCR-D/gt_structure_1_2
The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
2 01
OCR-D/gt_structure_1_3
The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
2 01
VRI-UFPR/ocrd-page-xml-draw
OCR-D wrapper for page-xml-draw
Language:Python3 0

page-xml

UglyToad/PdfPig

mittagessen/kraken

BobLd/DocumentLayoutAnalysis

UB-Mannheim/ocr-fileformat

lquirosd/P2PaLA

cneud/ocr-conversion

qurator-spk/dinglehopper

kba/transkribus-to-prima

UB-Mannheim/blatt

slub/textract2page

VRI-UFPR/page-xml-draw

Heresta/OCR17plus

IMAGO-Catalogues-Jjanes/cataloguesSegmentationOCR

qurator-spk/ocrd_repair_inconsistencies

tboenig/gt-guidelines

OCR-D/gt_structure_1_1

OCR-D/gt_structure_1_4

Lemmbraalemao-DPB/German-Brazilian-Newspapers-Dataset_1

Lemmbraalemao-DPB/German-Brazilian-Newspapers-Dataset_2

OCR-D/gt-repo-scripts

OCR-D/gt_structure_1_2

OCR-D/gt_structure_1_3

VRI-UFPR/ocrd-page-xml-draw