/readit

An open-source OCR engine developed without any ML libraries.

Primary LanguagePython

readit

Overview:

readit is an Optical Character Reader(OCR) engine developed using Python, bash and PHP. The application is capable of reading images that include character and and words and can convert it to digital texts using machine learning logics.

For any issue, please feel free to contact over email and can also update it to make readit a better reader :)

Description:

The objective of the project was to design an OCR engine from scratch without using any ML library or API. The application is capable of extracting text from images from both digital or handwritten words. By using web API, the application is also capable of processing multiple requests at a time in a scheduled manner using web and android(is not included here) interface. The entire application is designed in python, bash, apache server and android studio. It also generates log for further data analysis purposes. On a single note, the project is a perfect amalgamation of Machine learning in computer vision with Web Architecture.

How to Use:

1. Clone or Download the project files.
2. Install XAMPP server.
3. Move the folder “readIt1.0” inside .../readit/web_pages to htdocs in your installed XAMPP directory.
4. Change the file paths inside code according to your system directory as per requirement. 
(The paths are relative to a UNIX environment. Yours might vary depending on OS. So change accordingly.)
5. Open XAMPP and start apache server.
6. Open index.php in your favourite browser.
7. Congratulation!!! You have done it :)
8. enjoy and enhance readit :)

Note:

The training and dataset wasn’t uploaded due to github file size limitation. Please download a dataset from MNIST or wherever you feel free and then train it (code is already written to train and test network). You might need to change codes for your respective data. There are other programs “generate_test_file, generate_train_file, ... etc” (you can find these in “strt_prog” directory) written to genetrate your dataset file in .pkl format.

Behind the scene:

Project is developed by me and souravdkv.