TTesseractOCR4 is a Object Pascal binding for tesseract-ocr 4.x - an optical character recognition engine.
Examples were tested in Delphi 10.2.3 (32-bit build for Windows) and Lazarus 1.8 (32-bit build for Windows and Linux in Ubuntu 18.04).
- Clone this repository to a local folder.
- Obtain Tesseract 4.x binaries. I recommend using latest version, build from master branch of the tesseract project.
- Windows: Precompiled binaries can be found in
lib\tesseractocr-master.zip. Unpack and copy all DLL files tobin\.
Microsoft Visual C++ 2017 Redistributable x86 must be installed on the computer. - Linux:
sudo apt install tesseract-ocr.
This will also install required shared libraries (liblept5 and libtesseract4). - Common: Set
{$DEFINE USE_CPPAN_BINARIES}accordingly intesseractocr.consts.pasif using Tesseract libraries built with CPPAN (defined as default).
- Download trained language data files from tesseract-ocr/tessdata/ to
bin\tessdata.
All examples in this repository require English data file (eng.traineddata).
Additionallyexamples\delphi-console-pdfconvertexample requiresosd.traineddataandpdf.ttffiles.
Linux: Tested with language data from tesseract-ocr/tessdata_fast - Open and compile example project:
-
examples\delphi-console-simple. Recognize text insamples\eng-text.pngand write to console output
-
examples\delphi-vcl-image

4 tabs:- Image: View input image
- Text: Recognized text coded as UTF-8
- HOCR: Recognized text in HTML format
- Layout: View page layout (paragraphs, text lines, words...)
-
examples\delphi-console-pdfconvert. Convertsamples\multi-page.tif(multiple page image file) to a PDF file -
examples\lazarus-console-simple.examples\delphi-console-simplefor Lazarus
-
MIT
