adaptech-cz/Tesseract4Android

SIGSEGV when calling getUTF8Text with custom traineddata

abelokon0711 opened this issue · 6 comments

First of all, I would like so say thank you for your work and efforts on this project.

As the title says we are getting an abortion signal SIGSEGV when calling getUTF8Text(). This happens while we are using our trained data trained with the OCR-D repository.

When using this custom traineddata from command line (tesseract version 4.0.0beta) it works fine.
Does someone has any idea why this the case?

Hello, see my response here, I think that might be same problem - #13

After converting the model to integer with this command:
combine_tessdata -c custom.traineddata
, we still get SIGSEGV but this time we are getting thrown to this line in pageres.h:

  WERD_RES *restart_page() {
      return start_page(false);  // Skip empty blocks.
    }

Usually this happens when we try to use a traineddata filename that is not in the tessdata_dir.

You said you tried tesseract 4.0.0beta, but I'm using 4.1.0 here, so you should try your modified traineddata with tesseract 4.1.0 command line and see if it works at least there. I can't help you further with this, I don't have enough knowledge about this, sorry.

tesseract 4.1.0 on our Ubuntu system has no issues using the modified traineddata. Did you already use in the past custom traineddata and got it working?

No, I did not, sorry. Try to ask in official Tesseract forum what could be the cause (and let me know if you find what is wrong).

Okay, I just created a new project with Tess-two_example, added the build AAR of your project and it works with the custom model trained by OCR-D. So it seems to be a mistake within our application, not related to this repository. I'm sorry for the confusion, at least it's now clear that this repository works with modified traineddata.