/prinumco

Algorithmically generating Bengali digits and classification using MobileNetV2 for Bangladeshi license plate detection

Primary LanguagePythonMIT LicenseMIT

PriNumco

Algorithmically generating Bengali digits and classification using MobileNetV2 for Bangladeshi license plate detection

Dataset made-with-python MIT license stability-experimental Code style: black

Several initiatives have been taken to label and aggregate Bengali handwritten digit images with the aim of constructing robust digit recognition systems. However, deeplearning models trained on handwritten digit data do not generalize well on images of printed digits. PriNumco is a Compilation of Printed Bengali Digit images which aims to provide an extremely robust training-validation dataset for building recognition systems that can classify images of printed digits, sourced from license plates, billboards, banners and other real-life data sources.

Dataset Summary

Initially, the script uses 58 different Bengali fonts to generate 2320 (256 x 256) images of 10 digits (232 images per digit) and propels them through an augmentation pipeline to generate 200k train images. A similar procedure with different augmentation pipeline was followed to generate 28k test images.

Sample Images

Original Images

Augmented Images

Generated images are organized in the following hierarchy:

------------------------------------------------------------------------------------------------------
Folder Structure
------------------------------------------------------------------------------------------------------

.
├── dataset/
    ├── train/
        ├── 0/
            ├── img0.jpg
            ├── img1.jpg
            ...........
        ├── 1/
            ├── img0.jpg
            ├── img1.jpg
            ...........
        ├── 2/
            .....
    ├── test/
        ├── 0/
            ├── img0.jpg
            ├── img1.jpg
            ...........
        ├── 1/
            ├── img0.jpg
            ├──img1.jpg
            ...........
        ├── 2/
            .....

List of Applied Augmentations

In order to mimic real life images of Bengali digits, we generated the images with white, yellow, sky blue and teal colored backgrounds and used augmentor library to apply the following augmentations on both the train and test dataset:

  • gaussian_noise(probability=0.3, mean=0, std=20.0 )
  • blur(probability=0.6, blur_type='random', radius=(0, 100), fixed_radius=3)
  • random_filter(probability=0.9, filter_type='random', size=5)
  • black_and_white(probability = 1, threshold = 128)
  • rotate(probability=0.3, max_left_rotation=25, max_right_rotation=25)
  • rotate90(probability=0.005)
  • rotate270(probability=0.005)
  • zoom(probability=0.1, min_factor=1.01, max_factor=1.03)
  • skew_tilt(probability=0.01, magnitude=1)
  • skew_left_right(probability=0.02, magnitude=1)
  • skew_top_bottom(probability=0.03, magnitude=1)
  • skew_corner(probability=0.03, magnitude=1)
  • skew(probability=0.01, magnitude=1)
  • random_erasing(probability=0.01, rectangle_area=0.11)
  • random_brightness(probability=0.5, min_factor=0.5, max_factor=1.5)
  • random_color(probability=0.2, min_factor=0, max_factor=1)
  • random_contrast(probability=0.3, min_factor=0.4, max_factor=1)
  • invert(probability=0.09)
  • resize(probability=1, width=256, height=256)

For further details on individual augmentation operation, please checkout the documentation of augmentor libarary.

Requirements

  • Ubunutu >= 18.04 Bionic Beaver
  • Install requirements via:
    pip install -r requirements.txt
    
  • Make sure you have CUDA 10.1 and compatible CuDNN installed. To install tensorflow 2.0, run:
    pip install tensorflow-gpu==2.0.0-rc0
    

Script Arrangement & Order of Execution

Image Generation and Augmentation

------------------------------------------------------------------------------------------------------
Folder Structure
------------------------------------------------------------------------------------------------------

.
├── bfonts/
├── digit_generation_src/
    ├── directory_generation_check_font.py [Making necessary directories to contain the dataset]
    ├── digit_generation.py [Generating the actual digits]
    ├── image_augmentation.py [Augmenting the images]
    ├── mixing_aug_image_with_gen_image.py [Mixing the augmented images with the generated images]
    ├── digit_gen_main.py [Script containing the main function]

To run the digit generation and augmentation pipeline,

  • Make a folder name custom in the path /usr/share/fonts/truetype via the following command:
    sudo mkdir /usr/share/fonts/truetype/custom
    
  • Copy the fonts from the bfonts folder to /usr/share/fonts/truetype/custom path via:
    sudo cp -r ./bfonts/.  /usr/share/fonts/truetype/custom
    
  • Run the digit_gen_main.py file to generate, augment and prepare and the images in their corresponding folders.
    python digit_gen_main.py
    

Training and Validating a Baseline CNN model

------------------------------------------------------------------------------------------------------
Folder Structure
------------------------------------------------------------------------------------------------------

.
├── cnn_src/
	├── train.py
	├── test.py
	├── test.png
├── model/
	├── prinumco_mobilenet.h5
├──results/
        ├── acc.png
        ├── loss.py
        ├── test.png

We used tensorflow 2.0's keras API to construct and train mobilenetV2 architecture to provide a baseline CNN model for benchmarking purposes.

  • Put the dataset folder in the root folder (In case you haven't generated the images yourself)
  • Run the train.py file to train the baseline model
  • For testing out the model, run the test.py file (This will load our pretrained model to predict the class of a sample test.png image)

Contributors