Running issue with simple.png exemple under Win 10
eddydev03 opened this issue · 4 comments
Dear Eihli, Your program will help me in the future for personal porposes. I am running it on Win 10. I foolow all the steps to simply extract datas from images but I don't find why it does not run through it.
Here is the message after I run py -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png
Running extract_tables.main([C:\Users\MAGICB~1\AppData\Local\Temp\demo_cp3ejb98\simple.png]).
Extracted the following tables from the image:
[('C:\Users\\AppData\Local\Temp\demo_cp3ejb98\simple.png', ['C:\Users\\AppData\Local\Temp\demo_cp3ejb98\simple\table-000.png'])]
Processing tables for C:\Users*\AppData\Local\Temp\demo_cp3ejb98\simple.png.
Processing table C:\Users*\AppData\Local\Temp\demo_cp3ejb98\simple\table-000.png.
Traceback (most recent call last):
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 255, in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 947, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users*****\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1416, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users*****\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\demo_main.py", line 51, in
csv_output = main(sys.argv[1])
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\demo_main_.py", line 32, in main
ocr = [
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\demo_main_.py", line 33, in
table_ocr.ocr_image.main(cell, None)
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\ocr_image_init_.py", line 31, in main
txt = ocr_image(cropped, " ".join(tess_args))
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\ocr_image_init_.py", line 83, in ocr_image
return pytesseract.image_to_string(
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 409, in image_to_string
return {
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 412, in
Output.STRING: lambda: run_and_get_output(args),
File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 287, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users**\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 259, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
I have tesseract installed so I donnot get it:
PS C:\Users*\AppData\Local\Programs\Python\Python39> py -m pip install tesseract
Requirement already satisfied: tesseract in c:\users*\appdata\local\programs\python\python39\lib\site-packages (0.1.3)
Thanks for your help.
Eddy
The last line of that exception points to line 259 in the file pytesseract/pytesseract.py
.
Let's look at that line. https://github.com/madmaze/pytesseract/blob/a98ea7530711ac1319f6504857aa9318d63a2774/pytesseract/pytesseract.py#L256
try:
proc = subprocess.Popen(cmd_args, **subprocess_args())
except OSError as e:
if e.errno != ENOENT:
raise e
raise TesseractNotFoundError()
It's catching an OSError and then throwing a TesseractNotFoundError. It never actually tells us what the OSError is. It is making an assumption that the only OSError that could ever happen is that it can't find Tesseract. Since you say you have Tesseract installed, perhaps there is some other OSError that is being thrown.
You could edit that file C:\Users**\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py
on line 258.5 and add a print(e)
to see details about the OSError.
I'll jump ahead to what I expect you'll see if you do that.
This is the code of the entire function where you're getting the error.
def run_tesseract(
input_filename,
output_filename_base,
extension,
lang,
config='',
nice=0,
timeout=0,
):
cmd_args = []
if not sys.platform.startswith('win32') and nice != 0:
cmd_args += ('nice', '-n', str(nice))
cmd_args += (tesseract_cmd, input_filename, output_filename_base)
if lang is not None:
cmd_args += ('-l', lang)
if config:
cmd_args += shlex.split(config)
if extension and extension not in {'box', 'osd', 'tsv', 'xml'}:
cmd_args.append(extension)
try:
proc = subprocess.Popen(cmd_args, **subprocess_args())
except OSError as e:
if e.errno != ENOENT:
raise e
raise TesseractNotFoundError()
with timeout_manager(proc, timeout) as error_string:
if proc.returncode:
raise TesseractError(proc.returncode, get_errors(error_string))
You'll see it's running proc = subprocess.Popen(cmd_args, **subprocess_args())
. That line is trying to run a command "cmd_args
".
What is cmd_args
? cmd_args += (tesseract_cmd, input_filename, output_filename_base)
.
What is tesseract_cmd
? tesseract_cmd = 'tesseract'
Try running the command tesseract
from your terminal and you'll probably get an error. That will probably be the same error that you're code is throwing, namely, that you don't have tesseract
installed.
So why does py -m pip install tesseract
show the requirement already satisfied?
Because you have the tesseract
python package installed. Which is totally different from the tesseract
software. This is the Python package: https://pypi.org/project/tesseract/. This is the software: https://tesseract-ocr.github.io/tessdoc/Downloads.html.
Good evening Eihli, thank you for your quick answer. I appriciate it.
So I have tried print (e) on line 258.5 of C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py but when I run the command it does not show me anything more than the same error. With test, the print runs correctly outside of the try: however not inside. Hence, I can't really know the error still, I supposed. Concerning tesseract, you are completly right. I had the python package installed, but not the software. My question is which path do I install it when I run the .exe? Because pytesseract.py still say that it is still not into the right path. I downloaded tesseract 3.02 (this is the last official version for windows). Do I need to go for the unofficial version 5.0.0?
Sincerely,
I'm not familiar enough with Windows to be of much help with that part.
Doing searches for phrases like "windows pytesseract can't find tesseract exe path" should take you down the correct path.
For example, I found this issue that seems to touch on the issues you're having. maxenxe/HQ-Trivia-Bot-NOT-MAINTAINED-#51
It has the following comments:
My path doesn't look like yours @maxenxe I'm on windows 10.
I'm getting the same error and I can't find a clear answer.
That's the only thing missing for me. It keeps saying tesseract not
recognized as internal or external.
Can someone tell me how to add it on PATH on windows 10.
Control Panel > System and Security > System
> Advanced system settings > Advanced > Environment variables > PATH > New
Hello Eihli, I feel kind of stupid. I had to restart the computer for the PATH to be created. Everything fine now. Thank you for your help. Have a good day.