rhardih/bad

Unable to build tesseract 4.0.0 on docker (Windows PC)

Kunal-git opened this issue · 11 comments

I am trying to build tesseract 4.0.0 (.so file) arm64-v8a for Android, on a Windows PC.
I tried copying the docker file "tesseract-4.0.0.Dockerfile" on a local directory and running
build docker . and got an error

Step 4/19 : FROM bad-tiff:4.0.10-$ARCH AS tiff-dep
pull access denied for bad-tiff, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

I also tried building image from url by running
docker build https://github.com/rhardih/bad/blob/master/tesseract/tesseract-4.0.0.Dockerfile
but again got an error

Downloading build context from remote url: https://github.com/rhardih/bad/blob/master/tesseract/tesseract-4.0.0.Dockerfile 81.11kB
Sending build context to Docker daemon 82.94kB
Error response from daemon: Dockerfile parse error line 7: unknown instruction: <!DOCTYPE

I am new to docker and cross-compilation. So, i am not sure where the problem is. Any suggestions would be great!

And thanks for this repo..!!

If you just copy the dockerfile, you are sidestepping the dependency chain specified in the Makefile and if you don't provide the dependent libraries in some other fashion yourself, it will not work. E.g. Tesseract 4 depends on both leptonica and tiff.

The instructions are detailed in the Building section of the README.

Please let me know if it could be made clearer.

Sorry for the trouble. But I still don't get it.
I copied the entire repository in a directory. Then copied the Tesseract 4 docker file outside the tesseract directory as a "Dockerfile" file (and the tesseract.mk file as well). Then ran docker build . which gave same error

Step 4/21 : FROM bad-tiff:4.0.10-$ARCH AS tiff-dep
pull access denied for bad-tiff, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

When I run the command make tesseract-arm64-v8a/4.0.0 as described by_scubess or in the docs_examlpe , I simply get the error

make tesseract-arm64-v8a/4.0.0
bash: make: command not found

What am I missing, are there any additional steps I need to do before this, or am i doing something wrong?
Thanks..!

[P.S. I copied the dockerfile of tesseract outside the folder, because it wasn't running at all when i tried running from inside tesseract folder.
Error :

docker build tesseract-4.0.0.Dockerfile
unable to prepare context: context must be a directory: C:\Program Files\Docker Toolbox\TestDocker\bad-master\tesseract\tesseract-4.0.0.Dockerfile

Maybe I made a mistake here]

It's in the error message you get actually. You are missing make.

I don't have anything that runs cygwin, but this SO answer hints that you need to choose to include make during the installation of cygwin itself.

With regards to copying docker files in and out of the project; I'd really not recommend that approach, unless you are 100% sure of what you are doing. You should follow the included instructions, to make sure dependencies are compiled and available before compiling a dependent library. This is the exact thing that make solves.

libtesseract.zip

Ok. so i tried a workaround. Installed ubuntu on virtual box, with docker, and tried to run the command: make tesseract-arm64-v8a/4.0.0. And was successfull in generating libtesseract.so file.
But, when i try using it in my project on android,

[DllImport(TesseractDllName)]
private static extern IntPtr TessBaseAPICreate();
IntPtr tessHandle = IntPtr.Zero;

public bool Init (string lang, string dataPath){
    tessHandle = TessBaseAPICreate();
    ....
}

fails with exception

Unable to load DLL 'libtesseract.so': The specified module could not be found.

Never seen this error before. Any ideas?
Maybe the file hasn't been compiled properly.
[attached .so file, if that is any helpful]

Good to hear you got through the compilation. If the build finished without any errors, you can trust that the library compiled properly. The .so file looks right enough as far as I can tell.

In order to troubleshoot, I would probably start with a known project where you load in a shared library in a similar manner. That way you can be sure that everything is in the right place, and then add, or swap, libtesseract.so instead. This all comes down to how you're building your app and what you are building it in / with, of course. I only have experience with Android apps built with Qt, so that's probably as far as I can be of help.

For reference, this is a simple Qt application that runs on Android:

https://github.com/rhardih/bad/tree/master/tests/tesseract/tesseract-4-0-0

ok.. thanks.. :)
I am using Unity3D, trying an android project. I did just that, replaced the previous libtess.so with the new one.
I will do some more poking around to see what else could be causing the failure.

One question. Which additional files do i need to include along with the the compiled libtesseract.so for Tesseract 4.
For eg, for tesseract 3, along with libtess.so, i need to also provide leptonica (liblept.so), libjpgt.so and libpngt.so files for it to work (as plugins/dlls)
So, do i need to also provide leptonica file, or all the 3 additional files, or will libtesseract.so will be sufficient by itself?

Because Tesseract will not compile without at least libleptonica, that would in theory be the only other .so file you'd need to include. Tesseract will try to dynamically load libleptonica on startup and throw an error if it's not found.

For testability however, the way the compilation is setup with bad currently, libleptonica is compiled and linked against libtiff, which makes it a transitive dependency of Tesseract as well.

You could compile libleptonica with the flag --without-tiff, which removes this dependency and still have it work with Tesseract.

@Kunal-git is your unity project works?
am on the same problem

I was unable to compile Tesseract4 by this method, got into some trouble, dont remember now. So, i compiled Tesseract4 by a different method.
Here is how i did it : stackoverflow_thread
Hope it helps