SIGSEGV when calling getUTF8Text with custom traineddata

Question

SIGSEGV when calling getUTF8Text with custom traineddata

abelokon0711 opened this issue 5 years ago · 6 comments

First of all, I would like so say thank you for your work and efforts on this project.

As the title says we are getting an abortion signal SIGSEGV when calling getUTF8Text(). This happens while we are using our trained data trained with the OCR-D repository.

When using this custom traineddata from command line (tesseract version 4.0.0beta) it works fine.
Does someone has any idea why this the case?

Answer 1 · 2019-08-01T22:31:27.000Z

Hello, see my response here, I think that might be same problem - #13

Answer 2 · 2019-08-02T09:44:26.000Z

After converting the model to integer with this command:
combine_tessdata -c custom.traineddata
, we still get SIGSEGV but this time we are getting thrown to this line in pageres.h:

  WERD_RES *restart_page() {
      return start_page(false);  // Skip empty blocks.
    }

Usually this happens when we try to use a traineddata filename that is not in the tessdata_dir.

Answer 3 · 2019-08-02T13:17:31.000Z

You said you tried tesseract 4.0.0beta, but I'm using 4.1.0 here, so you should try your modified traineddata with tesseract 4.1.0 command line and see if it works at least there. I can't help you further with this, I don't have enough knowledge about this, sorry.

Answer 4 · 2019-08-06T10:02:21.000Z

tesseract 4.1.0 on our Ubuntu system has no issues using the modified traineddata. Did you already use in the past custom traineddata and got it working?

Answer 5 · 2019-08-06T13:18:14.000Z

No, I did not, sorry. Try to ask in official Tesseract forum what could be the cause (and let me know if you find what is wrong).

Answer 6 · 2019-08-06T13:25:53.000Z

Okay, I just created a new project with Tess-two_example, added the build AAR of your project and it works with the custom model trained by OCR-D. So it seems to be a mistake within our application, not related to this repository. I'm sorry for the confusion, at least it's now clear that this repository works with modified traineddata.