UB-Mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
JavaScriptMIT
Issues
- 6
Challenges processing textract
#187 opened by joewiz - 8
page__text.xsl is not honoring the reading order
#138 opened by mikegerber - 2
- 1
Missing CITATION.cff file for repository
#179 opened by mhucka - 2
ocr-transform alto hocr: HTML, but xmlns=xhtml
#184 opened by jbarth-ubhd - 12
page to hocr: cr_carea vs ocr_carea
#183 opened by jbarth-ubhd - 0
[feature request] Support TSV format
#181 opened by stweil - 2
Broken badge on repo
#178 opened by mhucka - 0
`make all` wants to write to `PREFIX`
#176 opened by stweil - 11
Release version 0.3.0 and 1.0.0
#120 opened by zuphilip - 0
- 1
Docker installation
#173 opened by yuvaler1 - 4
- 0
Table extraction
#164 opened by kba - 8
Google Cloud Vision to PAGE-XML
#125 opened by kba - 13
"ocr-transform page alto ... ...": loosing text
#123 opened by jbarth-ubhd - 3
Add update mechanism
#119 opened by zuphilip - 1
Support conversion to MiniOCR
#135 opened by kba - 0
Add example files
#159 opened by nichtich - 13
[feature request] Support MacOS
#150 opened by stweil - 0
Feature request: Page concatenation during conversion
#157 opened by jsbien - 6
regression: page-to-alto is missing
#153 opened by bertsky - 8
New Saxon version 10.2 is out
#124 opened by zuphilip - 2
Web interface in Docker container/ Error when uploading document: "Must be either POST with the field 'file'...."
#136 opened by cboulanger - 2
Conversion from ABBYY to ALTO
#147 opened by kba - 0
page page2019: does not work
#145 opened by bertsky - 7
alto to text: too many spaces
#129 opened by jbarth-ubhd - 2
- 2
Transformation for ImageWare MyBib
#139 opened by karkraeg - 9
GCV to HOCR or PAGE conversion not working
#121 opened by OmriPi - 7
Proxy support
#133 opened by mikegerber - 15
alto2hocr: Content in BottomMargin is not considered (PrintSpace node is missing in this example)
#89 opened by jtlz2 - 6
Support ALTO 4.0
#81 opened by zuphilip - 6
- 24
- 4
Support conversion from and to PAGE XML
#79 opened by stweil - 2
GCV2hocr not working: no file
#109 opened by zuphilip - 1
Add hocr__page transformation
#113 opened by zuphilip - 2
Simplify validations
#115 opened by zuphilip - 1
Pretty print option for CLI
#118 opened by zuphilip - 1
loop of files downloading
#93 opened by yanirmr - 0
Extend automated tests in CI
#114 opened by zuphilip - 1
Multiple downloads
#108 opened by zuphilip - 0
Compatibility of XSLT 1.0 with new Saxon HE
#107 opened by zuphilip - 3
multi-choice of files in the web interface
#94 opened by yanirmr - 2
Show version info in command line
#87 opened by TuulaP - 4
No text from OCRopy hOCR
#85 opened by stweil - 0
- 21
Converting hOCR to Alto
#96 opened by asor12 - 9
installation problem under macOS 10.13.6
#88 opened by jtlz2