MacOS uses Tesseract and not Tesseract-OCR

Question

MacOS uses Tesseract and not Tesseract-OCR

Closed this issue 3 months ago · 2 comments

Description of the bug

pymupdf/__init__.py in ?(tessdata)
  17818     # Unix-like systems:
  17819     cp = subprocess.run("whereis tesseract-ocr", shell=1, capture_output=1, check=0, text=True)
  17820     response = cp.stdout.strip().split()
  17821     if cp.returncode or len(response) != 2:  # if not 2 tokens: no tesseract-ocr
> 17822         raise RuntimeError("No tessdata specified and Tesseract is not installed")
  17823 
  17824     # search tessdata in folder structure
  17825     dirname = response[1]  # contains tesseract-ocr installation folder

RuntimeError: No tessdata specified and Tesseract is not installed

How to reproduce the bug

PyMuPDF installation command:
uv add pymupdf

Issue:

for page in doc:
    textPage = page.get_textpage_ocr()
    print(textPage.extract_text())

On running the above script, I am getting the error

I can see that on MacOS, tesseract is installed using brew install tesseract and has no package for tesseract-ocr

Tesseract Installation Proof:
tesseract: /opt/homebrew/bin/tesseract
tesseract-ocr:

PyMuPDF version

1.26.1

Operating system

MacOS

Python version

3.12

Answer 1 · 2025-06-18T12:34:17.000Z

You know that you can fix this by either directly providing the folder name of tessdata or setting the appropriate environment variable (before starting your script)?

Answer 2 · 2025-08-25T14:37:35.000Z

Fixed in PyMuPDF-1.26.4.