Thai ID OCR App with Django, PyTesseract, and OpenCV

Overview

Welcome to the Thai ID OCR App, a Django-based application that performs Optical Character Recognition (OCR) on Thai ID card images. PyTesseract is used for OCR, and OpenCV is employed for image preprocessing to enhance readability. The application allows users to extract information from Thai ID cards, save the results in a database, and perform operations such as updating or deleting stored ID cards.

Link

Correctly Hosted Working LINK : https://ocr-thai-production-7e36.up.railway.app/

Technologies Used

  1. Django for web application development
  2. PyTesseract for OCR
  3. OpenCV for image preprocessing
  4. Database: SQLite (default with Django)
  5. Other dependencies: See requirements.txt

Setup Instructions

  1. Tesseract-ocr has to be installed and added to path on your desktop pc to run the project.

    For macOS :

    brew install tesseract
    

    For Windows : Download the installer for tesseract from the following link and use windows install wizard to install.

    https://github.com/UB-Mannheim/tesseract/wiki
    
  2. Clone the repository:

    git clone <https://github.com/sparsh-kumar7/Qoala)https://github.com/sparsh-kumar7/Qoala>
    cd thai-id-ocr-app
    
    
  3. Create Virtual Environment:

    python -m venv .venv
    
  4. Start Virtual Environment:

    For macOS:

    source venv/bin/activate
    

    For Windows:

    venv\Scripts\activate
    
    
  5. Install dependencies:

    pip install -r requirements.txt
    
    
  6. Run migrations:

    python manage.py migrate
    
    
  7. Start the Django development server:

    python manage.py runserver
    
    

User Interface

  1. Upload a Thai ID card image (PNG, JPEG, JPG) for OCR.
  2. View the extracted information (name, last name, identification number, date of birth, date of issue, date of expiry) in JSON format.

Advanced Features

  1. Error handling for unreadable or unclear ID cards.
  2. Code comments for better understanding.

Snapshots

Screenshot 2023-12-26 at 4 25 28 AM Screenshot 2023-12-26 at 4 29 08 AM Screenshot 2023-12-26 at 11 49 10 AM

The following steps occurred during image processing.

  1. Inverted the image.

PHOTO-2023-12-27-01-05-56

  1. Greyscaled the image.

PHOTO-2023-12-27-01-05-56

  1. Changed it to black and white using thresholding.

PHOTO-2023-12-27-01-05-55

  1. Reinverted the image.

PHOTO-2023-12-27-01-05-55

  1. Added border to read corner texts.

PHOTO-2023-12-27-01-05-56