gxrxrdx/tesseract-ocr

Error while running tesseract for a new traindata

Closed this issue · 3 comments

What steps will reproduce the problem?
1.Collecting all the files for traindata 
2.Making traindata
3.Put the traindata in tesssdata folder and run tesseract.

What is the expected output? What do you see instead?
Expected ouput is a text file containing the images of the text.Instead, I see 
the error
Index>=0 &index<size_used:Error:Assert_Failed

Please use labels and text to provide additional information.
I gave the screenshot of the error and other files.My language name is ban and 
font name is sl.

Original issue reported on code.google.com by m.tawfi...@gmail.com on 29 Mar 2015 at 4:03

Attachments:

I think there is a problem with your font_properties file. It seems to have a 
blank line above, while blank line should be at the end.

I was able to generate the traineddata with your files in jtessboxeditor (I 
needed to add the words list, frequent words list and rename the font 
properties file to the naming convention needed by the program.

BTW, there is already traineddata for Bangla - please see

https://code.google.com/p/tesseract-ocr/source/browse/ben.traineddata?repo=tessd
ata

and also see

https://code.google.com/p/tesseract-ocr/source/browse?repo=langdata#git%2Fben


Original comment by shreeshrii on 30 Mar 2015 at 8:50

No, this will not work if I do not leave a blank space in front of the first 
line, however, I have the same tif file as input.By the way,

Original comment by m.tawfi...@gmail.com on 31 Mar 2015 at 2:27

You did not follow instruction[1] e.g. font_properties.txt does not meet 
"Requirements for text input files", so I guess you did not created valid 
traineddata.

Anyway you issue is invalid, because for support you should use tesseract user 
forum. Issues tracker should be only for reporting of google produced 
traineddata files.

[1] https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

Original comment by zde...@gmail.com on 9 Apr 2015 at 8:06

  • Changed state: Invalid