This repository contains Python code for Optical Character Recognition (OCR), allowing the extraction of text content from images and other graphical representations.
- Image preprocessing to enhance OCR accuracy.
- Text extraction from various image formats (e.g., PNG, JPEG, TIFF).
- Language support for multiple languages.
- Detailed error handling and logging for improved stability.
-
Install Tesseract using Homebrew:
brew install tesseract
-
Download the Tesseract installer for Windows from the official GitHub repository.
-
Run the installer and follow the installation instructions.
-
Add Tesseract to the system PATH:
- Open the Start menu and search for "Environment Variables".
- Click on "Edit the system environment variables".
- Click the "Environment Variables" button.
- Under "System variables", scroll down and select "Path", then click "Edit".
- Click "New" and add the Tesseract installation path (e.g.,
C:\Program Files\Tesseract-OCR
).
-
Verify the installation in Command Prompt:
-
Python 3.6+
-
Install dependencies using pip:
pip install -r requirements.txt
If you would like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or improvement.
- Make your changes and submit a pull request.
This project is licensed under the MIT License.