Tesseract 5.0-alpha only working with --oem 3
Closed this issue · 1 comments
I am running tesseract under Windows 10 Pro with the following version:
tesseract v5.0.0-alpha.20200328 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 Found libcurl/7.59.0 OpenSSL/1.0.2o (WinSSL) zlib/1.2.11 WinIDN libssh2/1.7.0 nghttp2/1.31.0
I am executing tesseract from python via
pytesseract version 0.3.4
Running the command
pytesseract.image_to_data(crop, output_type=ocr.Output.DATAFRAME, config='--psm 11 --oem 3')
is only successfull with --oem 3
, no other version works.
E.g., running --oem 4
results in
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 437, in image_to_data
}[output_type]()
File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 433, in <lambda>
args + [True], pandas_config,
File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 407, in get_pandas_output
return pd.read_csv(BytesIO(run_and_get_output(*args)), **kwargs)
File "C:\Users\Christian\Anaconda3\envs\bbcDataConverter\lib\site-packages\pytesseract\pytesseract.py", line 272, in run_and_get_output
with open(filename, 'rb') as output_file:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\...\\AppData\\Local\\Temp\\tess_j6_ns_9g.tsv'
How can I access other versions? Thank you!
There is no OEM 4. --oem 1
and --oem 3
should work (with identical results). For the legacy engine, you will need model files which include that.
OCR Engine modes:
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.