tesseract-ocr/tesseract

Failed to load language 'eng'

Mathankumar1312 opened this issue · 8 comments

I do facing an issue while using the OCR engine modes 0 & 2.

Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

But it happens only when using oem option 0,2

My tesseract command is
tesseract input_image output.txt -l eng --psm 6 --oem 0

Thanks,

You got error message because you are using data model that is missing legacy part. See tesseract --help-oem. So this your choice and tesseract work as expected.
Please use tesseract user forum for asking support.

You got error message because you are using data model that is missing legacy part. See tesseract --help-oem. So this your choice and tesseract work as expected.
Please use tesseract user forum for asking support.

I am executing a basic command:
tesseract test.jpg -psm test.pdf -l eng

I receive the same error:

Failed loading language 'eng'

@zdenop am I also using data model that is missing legacy part?
Does this software not support to apply OCR in the English language?

tesseract test.jpg -psm test.pdf -l eng is not valid tesseract command

@zdenop which would be the command to apply OCR in an image file and have an output pdf file?

tesseract --help. This is issue tracker - not support forum. Please respect guidelines for posting issue.

From tpserac.image_to_string(Image.open('1569330507656_-2019-09-24-_18-38-47.jpg') ,config="--psm 3") I remove --oem 2 from config parameter work form me now

It depends on which testdata set you are using (testdata, testdata-fast or testdata-best), wether Legacy (-oem 0 or oem 2) can be used or not. Have a look at this description:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017

I was just using some simple script on a simple image on windows, and it was working, and then all of a sudden I became getting this error too. Here are my code examples with images.

rus language works

tess_0_map

import Tesseract from 'tesseract.js'

Tesseract.recognize('./tess_0_map.png', 'rus', {
  // logger: (m) => console.log(m),
}).then(({ data: { text } }) => {
  console.log(text)
})

Returns:

® Развязка

eng language doesn't work

tess_1_stats_exp

import Tesseract from 'tesseract.js'

Tesseract.recognize('./tess_1_stats_exp.png', 'eng', {
  // logger: (m) => console.log(m),
}).then(({ data: { text } }) => {
  console.log(text)
})

Returns:

Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
AdaptedTemplates != nullptr:Error:Assert failed:in file /workspace/tesseract/src/classify/adaptmatch.cpp, line 196
undefined
undefined
node_modules\tesseract.js\src\createWorker.js:173
        throw Error(data);
        ^

Error: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
    at ChildProcess.<anonymous> (node_modules\tesseract.js\src\createWorker.js:173:15)
    at ChildProcess.emit (events.js:315:20)
    at emit (internal/child_process.js:903:12)
    at processTicksAndRejections (internal/process/task_queues.js:81:21)

Weirdly eng version worked a couple times actually, but then it stopped, by some reason. I tried to reinstall the package, restart the console, but that doesn't seem to fix the issue.

UPD.

Whoops, I figured that out! I was tinkering with traineddata, downloaded some examples, and I copied eng.traineddata into the folder where my script is placed. It seems it broke tesseract by some reason. But as soon as I deleted this file, remaining script folder clean, it continued to work as intended!