Docker Image with latest Tesseract OCR Version 5.x.x built from sources.
The sources are pulled from the latest main
branch and latest releases
of the Tesseract OCR project.
Docker Hub: https://hub.docker.com/r/franky1/tesseract
Pull the docker image from Docker Hub:
docker pull franky1/tesseract
Mount your image data to the /tmp
directory and run Tesseract OCR container with the required command line options, for example, run Tesseract OCR container with test image:
docker run -it -v ${PWD}/testdata:/tmp --rm franky1/tesseract \
tesseract english.png output --oem 1 -l eng
For the Tesseract command line options, please refer to the Tesseract Manual
Test if the mounted languages from your local subfolder /tessdata
are available in the Docker container.
Be aware that the local languages overwrite the installed languages in the Docker image. Example here with french language:
docker run -it -v ${PWD}/testdata:/tmp \
-v ${PWD}/tessdata:/usr/local/share/tessdata/ \
--rm franky1/tesseract
Test the mounted languages in the Docker container with a sample image. Example here with french language:
docker run -it -v ${PWD}/testdata:/tmp \
-v ${PWD}/tessdata:/usr/local/share/tessdata/ \
--rm franky1/tesseract \
tesseract french.jpg output --oem 1 -l fra
Alternatively, you can build a new Docker image if you want other languages, see next section.
For details have a look into the Dockerfile.
- Git clone this repo.
- Add your required languages to the languages.txt file.
- (a) Build the docker image from scratch, if you want the latest sources from the
main
branch.
docker build --tag tesseract .
- (b) Build the docker image from scratch, if you want a specific
release
version.
docker build --tag tesseract --build-arg TESSERACT_VERSION=5.0.0 .
- Run Tesseract OCR container with test image:
docker run -it --name tesseract -v ${PWD}/testdata:/tmp --rm \
tesseract tesseract english.png output --oem 1 -l eng
- Only supported target for this docker image currently is
linux/amd64
. - Working directory for ocr images is
/tmp
inside the container. See example above. - Directory for trained data is
/usr/local/share/tessdata/
inside the container. See example above. - This image was built without the Tesseract training tools.
- This image currently includes only the following languages:
- English:
tessdata_best > eng.traineddata
- German:
tessdata_best > deu.traineddata
- If you need other languages, you have to build your own image or mount trained data to the
/usr/local/share/tessdata/
directory. See example above.
- English:
- Overview of supported languages https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
- Trained models with support for legacy and LSTM OCR engine https://github.com/tesseract-ocr/tessdata
- Fast integer versions of trained LSTM models https://github.com/tesseract-ocr/tessdata_fast
- Best (most accurate) trained LSTM models https://github.com/tesseract-ocr/tessdata_best
- Docker Hub: https://hub.docker.com/repository/docker/franky1/tesseract
- Original Tesseract Github Repository: https://github.com/tesseract-ocr/tesseract
- Original Tesseract Documentation: https://tesseract-ocr.github.io/
- Original Tesseract Manual: https://tesseract-ocr.github.io/tessdoc/
- More
tessdata_best
languages: https://github.com/tesseract-ocr/tessdata_best
- Update
README.md
to latest Dockerfile and Usage - add
workflow_dispatch
to github workflows - Add dependabot on Github
- Add vulnerability scanning in Github Actions with Snyk
- Add GitHub Action for check container efficiency with Dive https://github.com/MartinHeinz/dive-action
- Add badges to
README.md
- Add documentation for GitHub Actions Workflow
- Add more inline comments in GitHub Actions related files
- Build image for more targets
- Building Tesseract with TensorFlow?
- Building Tesseract with Training tools?
- Change build in Dockerfile according to instructions in Compiling-GitInstallation.md
-
27.07.2022
currently the build of the main source branch fails, reason is unknown
If you have any bugs or requests regarding this Docker image, please post an issue in this Github Repository.
27.07.2022: Docker Image is ready for usage, still some slight improvements possible, sometimes build issues