This project uses Optical Character Recognition (OCR) to extract text from images.
- Install Tesseract on your machine. For instructions, see: https://github.com/tesseract-ocr/tessdoc#installation
For Fedora, I needed to follow this guide: https://blog.mdda.net/oss/2016/08/10/tesseract-and-python-on-fedora and run the following:
sudo dnf install tesseract-devel
pip install tesserocr
-
Create a virtual environment (optional, but recommended):
python3 -m venv venv source venv/bin/activate
-
Install the required Python dependencies:
pip install -r requirements.txt
To run the script, simply execute the following from a terminal:
python main.py path_to_image1 path_to_image2
This will print the text to the console by default, but if you add the --to-file
flag, it will print the text to separate files in the a directly called output
.
python main.py path_to_image1 path_to_image2 --to-file
The script expects a file named input.png in the same directory. You can replace it with your image file. Please replace 'input.png' with the path to your image in main.py
.
The extracted text will be printed on the console.
This tool accepts multiple image formats since OpenCV's cv2.imread() function supports a variety of image formats including .bmp, .jpg, .jpeg, .png, .tif, .tiff, etc.
Remember, Tesseract does a good job when the image is of high quality and the text is horizontal. For complex cases involving rotations, skewness, different languages or noisy backgrounds, you might have to use additional image processing techniques or different OCR tools.
For more information on how to use this tool and how it works, see the following documentation: