tesseract-ocr/tessdata

Error: LSTM requested, but not present!! Loading tesseract

Pierre918 opened this issue · 5 comments

Hello, i am trying to use LSTM tesseract with :
tesseract exercices-equations-inequations-3eme-1.png text -l equ --oem 1

But it gives me the following error :
Error: LSTM requested, but not present!! Loading tesseract

i downloaded the .traineddata in the repo tessdata.
I wrote this command to get the equ.traineddata :
sudo wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata

In fact when i choose --oem 0 or 3 it is working, otherwise it isn't. It is also working when i write :
tesseract exercices-equations-inequations-3eme-1.png text -l fra --oem 1
whatever the --oem "number" is
Do you please have any ideas ?

PS : I am on lubuntu and i got tesseract 4.1.1

That means that equ has only legacy data.

And how can I add lstm data ?
Did I downloaded the wrong file ?

There is currently no LSTM data available. Someone would have to create it.

Ah ok it explains my problem.
And do you know when it will be created ?
Otherwise, i heard it was possible to create an LSTM ourself, can you give me any documentation to do it ?

thanx a lot

AFAIK equ was experiment and that why it was not updated to LSTM.
LSTM training is based on words and lines so it is question if math/equation training make sense in sense in this context. I believe Google has good reasons why they skipped it.
Training is described in docs, tools are in separated repository