UB-Mannheim/tesseract

Tesseract 5.0-alpha only working with --oem 3

noteven2degrees opened this issue · 1 comments

I am running tesseract under Windows 10 Pro with the following version:
tesseract v5.0.0-alpha.20200328 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 Found libcurl/7.59.0 OpenSSL/1.0.2o (WinSSL) zlib/1.2.11 WinIDN libssh2/1.7.0 nghttp2/1.31.0

I am executing tesseract from python via
pytesseract version 0.3.4

Running the command

pytesseract.image_to_data(crop, output_type=ocr.Output.DATAFRAME, config='--psm 11 --oem 3')

is only successfull with --oem 3, no other version works.

E.g., running --oem 4 results in

Traceback (most recent call last):
  File "<input>", line 1, in <module>
 
File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 437, in image_to_data
    }[output_type]()
  
File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 433, in <lambda>
    args + [True], pandas_config,
  
File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 407, in get_pandas_output
    return pd.read_csv(BytesIO(run_and_get_output(*args)), **kwargs)
 
 File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 272, in run_and_get_output
    with open(filename, 'rb') as output_file:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\...\\AppData\\Local\\Temp\\tess_j6_ns_9g.tsv'

How can I access other versions? Thank you!

There is no OEM 4. --oem 1 and --oem 3 should work (with identical results). For the legacy engine, you will need model files which include that.

OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.