Optical Character Recognition

This is a code that reads the text present in an image and predicts what's written in it.

Process

The code first divides the image into multiple segments (each segment contains a single character). On this segment, a pretrained model is executed to predict the character present in the segment. This then outputs the predicted characters as a string

Directory Tree

├─ model
│  └─ fmodelwts.h5
├─ src
│  ├─ model.py
│  └─ ocr.py
├─ .gitignore
├─ LICENSE
└─ README.md

Setting up the OCR

Let's start by cloning the repository

$ git clone https://github.com/Mastermind0100/Optical-Character-Recognizer.git
$ cd Optical-Character-Recognizer

Great! You are set up with the repository.
Let's dive into it!

How to Use the OCR

Copy the following codes/files into the directory you are using for your project:
- ocr.py
- fmodelwts.h5
In your code, add the following lines:
```
import ocr
predict(image)
```
This code will print the text the code detects in the image you gave as input in the function 'predict'.
If you want the function to simply return the predicted text and not print it, then make the following changes to Line 78 of the program 'ocr.py':
```
return final
```
Also, your code needs to accept it in a variable. So the code in Step 2 will change to:
```
text = predict(image)
```

The 'image' that you pass in the argument of the predict function is the data after importing the image into the code using the imread function in opencv. But you knew that, right?
Note that this is a relatively basic OCR. It does not detect spaces for you or segment words in a sentence. While work is under progress for this, you can do some level of image pre-processing to make this work for you.
Watch out for further updates!

Want to train on your own Dataset?

Go ahead! Fire up 'model.py' and use your own dataset. Hopefully the code is self explanatory. P.S. The Dataset I used was the NIST Dataset. Download the 2nd Edition and have fun manually arranging data :)

Output

The Original photo looks like this:

Mid Processing Output:

Final Text Output (Spyder Console):

License

This project is licensed under the MIT License - see the LICENSE file for details