/Optical-Character-Recognizer

A code that reads an image and predicts what's written in it (OCR)

Primary LanguagePythonMIT LicenseMIT

forthebadge forthebadge

GitHub license LinkedIn-profile

Optical Character Recognition

This is a code that reads the text present in an image and predicts what's written in it.

Process

The code first divides the image into multiple segments (each segment contains a single character). On this segment, a pretrained model is executed to predict the character present in the segment. This then outputs the predicted characters as a string

├─ model
│  └─ fmodelwts.h5
├─ src
│  ├─ model.py
│  └─ ocr.py
├─ .gitignore
├─ LICENSE
└─ README.md

Setting up the OCR

Let's start by cloning the repository

$ git clone https://github.com/Mastermind0100/Optical-Character-Recognizer.git
$ cd Optical-Character-Recognizer

Great! You are set up with the repository.
Let's dive into it!

How to Use the OCR

  1. Copy the following codes/files into the directory you are using for your project:

    • ocr.py
    • fmodelwts.h5
  2. In your code, add the following lines:

    import ocr
    predict(image)
  3. This code will print the text the code detects in the image you gave as input in the function 'predict'.

  4. If you want the function to simply return the predicted text and not print it, then make the following changes to Line 78 of the program 'ocr.py':

    return final

    Also, your code needs to accept it in a variable. So the code in Step 2 will change to:

    text = predict(image)
  • The 'image' that you pass in the argument of the predict function is the data after importing the image into the code using the imread function in opencv. But you knew that, right?

  • Note that this is a relatively basic OCR. It does not detect spaces for you or segment words in a sentence. While work is under progress for this, you can do some level of image pre-processing to make this work for you.
    Watch out for further updates!

Want to train on your own Dataset?

Go ahead! Fire up 'model.py' and use your own dataset. Hopefully the code is self explanatory. P.S. The Dataset I used was the NIST Dataset. Download the 2nd Edition and have fun manually arranging data :)

Output

The Original photo looks like this:

plate1


Mid Processing Output:

up1


Final Text Output (Spyder Console):

up2

License

License
This project is licensed under the MIT License - see the LICENSE file for details