Document-Scanner

It is a Document Scanner which scans documents from an image and then does OCR (Optical Character Recognition) on it.

It involves 3 steps:

How to use it:

End Note - This project is heavily inspired by pyimagesearch course of computer vision to do document segmentation and OCR.

I will soon be building my own YOLO classifier which segments document using Neural Networks.

OCR

To install tesseract, run :

sudo apt-get install tesseract-ocr

If you want to do Optical Character Recognition on the extracted image, then go to the output folder, open terminal and run:

tesseract image_name out The output will be saved in a file named out.txt