LeoFCardoso/pdf2pdfocr

Create language based Dockerimages

Brice187 opened this issue · 3 comments

Hey, could you please create Dockerfiles for different languages and and upload the tagged images to the Docker Hub?

Alternatively you could add all tesseract ocr language packages to the Dockerfile, but this would nearly triple the image size:

larsk@MacBook-Pro pdf2pdfocr % docker image ls
REPOSITORY                           TAG                                              IMAGE ID            CREATED             SIZE
pdf2pdfocr                           all-lang                                         a74b8d22d02b        6 seconds ago       1.1GB
pdf2pdfocr                           latest                                           09eccd997dd3        6 minutes ago       417MB

Should I add a PR for this issue?

Hi there. Yes, you're right as adding languages can increase the resulting docker image.

Maybe we could specify some parameter to the container to point to a directory with languages. In this case, container may be dependent from the host operating system anyway.

I think it's better to let user build own images just editing the Dockerfile.

What's your idea about a PR?

This PR ideas came to my mind

  • Just submit a Dockerfile.lang-deu ;)
  • With some sed magic, I could add Dockerfiles for every lang.
  • change ubuntu to alpine to get a smaller image (maybe with all lang)

But for now, my use case is fulfilled. Thank you for your work!

Good idea! Thank you for posting the issue.