This repository contains scripts and definition of Docker container that helps to compile Tesseract 4. If you are looking for ready to use Teserract 4 Runtime Environment container (and don't want to compile it) please take look at tesseractshadow/tesseract4re.
If you are not familiar with Docker please read Docker - Get Started. This compilation procedure is based on:
Prerequisites:
- Install Docker
- Download and unzip this repository
Scripted steps (tested as a root sudo su
):
./scripts/1-pull-container.sh
- pull tesseractshadow/tesseract4cmp image from Docker Hub (automated build using dockerfile from this repository)../scripts/2-remove-container.sh
- (optional) remove t4cmp if it already exists and you want to start from begining (note, all compilation results stored inside container will be lost)../scripts/3-run-new-container.sh
- run the new t4cmp container../scripts/4-update-src.sh
- update source code of Leptionica and Tesseract../scripts/5-compile-src.sh
- compile Leptionica and Tesseract, it may take tens of minutes./scripts/6-test-ocr.sh
- do some OCR tests./scripts/7-build-pkg.sh
- (optional) build Leptionica and Tesseract packages and copy them outside t4cmp container
- Clone this repository to your $T4_WORKSPACE
- Execute
docker build -t tesseractshadow/tesseract4cmp $T4_WORKSPACE
(or./dockerfile.build.sh
)
You can get into the container using SSH:
localhost:4022
,- user:
root
, - password:
root