This repository contains scripts and definition of Docker container that helps to compile Tesseract. If you are looking for ready to use Teserract 4 Runtime Environment container (and don't want to compile it) please take look at tesseractshadow/tesseract4re.
If you are not familiar with Docker please read Docker - Get Started. This compilation procedure is based on:
Prerequisites:
- Install Docker
- Download and unzip this repository
Scripted steps (tested as a root sudo su
):
./scripts/1-opt-remove-container.sh
- (optional) remove tesseract-ocr if it already exists and you want to start from begining (note, all compilation results stored inside container will be lost)../scripts/2-run-new-container.sh
- run the new tesseract-ocr container../scripts/3-opt-show-ocr-info.sh
- show ocr version info./scripts/4-test-ocr.sh image_url
- do some OCR tests. 1st argument is image url e.g. https://github.com/vincenthome/tesseract-ocr-compilation/blob/master/test-images/problem.tif?raw=true./scripts/5-opt-build-pkg.sh
- (optional) build Leptionica and Tesseract packages and copy them outside tesseract-ocr container./scripts/x-pull-container.sh
- pull tesseractshadow/tesseract4cmp image from Docker Hub (automated build using dockerfile from this repository)../scripts/x-update-src.sh
- update source code of Leptionica and Tesseract../scripts/x-compile-src.sh
- compile Leptionica and Tesseract, it may take tens of minutes
- Clone this repository
- Execute
./dockerfile.build.sh
- Run Container: ./scripts/2-run-new-container.sh
- Show Tesseract version: ./scripts/3-opt-show-ocr-info.sh
- Test Tesseract: ./scripts/4-test-ocr.sh
You can get into the container using SSH:
localhost:4022
,- user:
root
, - password