tesseract-ocr/tessdata

Failed loading language 'eng'

Alookima21 opened this issue · 2 comments

I have installed tesseract using brew on my M1Pro, and my current version is:

tesseract 5.3.4
 leptonica-1.84.1
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.12 : libwebp 1.3.2 : libopenjp2 2.5.2
 Found NEON
 Found libarchive 3.7.2 zlib/1.2.12 liblzma/5.4.4 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5
 Found libcurl/8.4.0 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.55.1

I have checked and verified that eng.traineddata has been installed as well, and the path is correct. However on running tesseract on my code I get the error:

Error: Tesseract (legacy) engine requested, but components are not present in /opt/homebrew/share/tessdata/eng.traineddata!! Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.

I have uninstalled and reinstalled tesseract and have even tried manually downloading the eng model from the github repo, making a custom path and using that, but it throws the same error. Was hoping someone could help with this.

Please use the Tesseract user forum for questions.

Typically this error occurs because users did not load eng.traineddata, but the web page for that file. Try to open /opt/homebrew/share/tessdata/eng.traineddata in a text editor. Is it HTML code? Then that's the reason.

Ah, sorry, I (and you) should have read the error message. "Tesseract (legacy) engine requested, but components are not present" gives the reason for the failure. You have installed a fast model which includes a neural network for the LSTM engine, but which does not support the legacy OCR engine.