UB-Mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
JavaScriptMIT
Issues
- 6
Challenges processing textract
#187 opened - 2
- 2
- 12
page to hocr: cr_carea vs ocr_carea
#183 opened - 0
[feature request] Support TSV format
#181 opened - 1
Missing CITATION.cff file for repository
#179 opened - 2
Broken badge on repo
#178 opened - 0
`make all` wants to write to `PREFIX`
#176 opened - 0
- 1
Docker installation
#173 opened - 0
Table extraction
#164 opened - 0
Add example files
#159 opened - 0
- 6
regression: page-to-alto is missing
#153 opened - 13
[feature request] Support MacOS
#150 opened - 2
Conversion from ABBYY to ALTO
#147 opened - 0
page page2019: does not work
#145 opened - 2
- 2
Transformation for ImageWare MyBib
#139 opened - 8
- 2
- 1
Support conversion to MiniOCR
#135 opened - 7
Proxy support
#133 opened - 7
alto to text: too many spaces
#129 opened - 8
Google Cloud Vision to PAGE-XML
#125 opened - 8
New Saxon version 10.2 is out
#124 opened - 13
- 4
Support conversion from and to Textract JSON
#122 opened - 9
GCV to HOCR or PAGE conversion not working
#121 opened - 11
Release version 0.3.0 and 1.0.0
#120 opened - 3
Add update mechanism
#119 opened - 1
Pretty print option for CLI
#118 opened - 2
Simplify validations
#115 opened - 0
Extend automated tests in CI
#114 opened - 1
Add hocr__page transformation
#113 opened - 2
GCV2hocr not working: no file
#109 opened - 1
Multiple downloads
#108 opened - 0
Compatibility of XSLT 1.0 with new Saxon HE
#107 opened - 0
page2tsv
#99 opened - 21
Converting hOCR to Alto
#96 opened - 6
- 3
- 1
loop of files downloading
#93 opened - 15
alto2hocr: Content in BottomMargin is not considered (PrintSpace node is missing in this example)
#89 opened - 9
installation problem under macOS 10.13.6
#88 opened - 2
Show version info in command line
#87 opened - 4
No text from OCRopy hOCR
#85 opened - 6
Support ALTO 4.0
#81 opened - 4
Support conversion from and to PAGE XML
#79 opened - 24