otiai10/gosseract

Losing stderr when running multiple clients at once

uniwisejohannes opened this issue · 1 comments

Sorry if this is silly, but we wanted to hear the developer's thoughts on this.

In our project we initialise 4 different gosseract.Clients one at a time using gosseract.NewClient, and then for each of these clients we have a Go routine in which we call SetImageFromBytes and GetBoundingBoxes on a client. Each of these threads are consuming a lot of documents that they are processing one at a time (so 4 at a time sometimes).

This seemingly corrupts stderr so that we lose all the logs in our system.

Is it just not possible to run 4 seperate TessBaseAPIs at once?

We are using ENV OMP_THREAD_LIMIT=1 in our Dockerfile and our pod has 4 cores.

We also have RUN CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o /build/bin/service main.go

Our image base is debian:12

That's because gosseract currently hijacks stderr.
This was a workaround when we implemented gosseract, and we don't believe this is the best way.
We need to identify the best way. Meanwhile, I'm thinking about opt-out the stderr hijack.
What do you think?