Algorithmically generating Bengali digits and classification using MobileNetV2 for Bangladeshi license plate detection
Several initiatives have been taken to label and aggregate Bengali handwritten digit images with the aim of constructing robust digit recognition systems. However, deeplearning models trained on handwritten digit data do not generalize well on images of printed digits. PriNumco is a Compilation of Printed Bengali Digit images which aims to provide an extremely robust training-validation dataset for building recognition systems that can classify images of printed digits, sourced from license plates, billboards, banners and other real-life data sources.
Initially, the script uses 58 different Bengali fonts to generate 2320 (256 x 256) images of 10 digits (232 images per digit) and propels them through an augmentation pipeline to generate 200k train images. A similar procedure with different augmentation pipeline was followed to generate 28k test images.
Generated images are organized in the following hierarchy:
------------------------------------------------------------------------------------------------------
Folder Structure
------------------------------------------------------------------------------------------------------
.
├── dataset/
├── train/
├── 0/
├── img0.jpg
├── img1.jpg
...........
├── 1/
├── img0.jpg
├── img1.jpg
...........
├── 2/
.....
├── test/
├── 0/
├── img0.jpg
├── img1.jpg
...........
├── 1/
├── img0.jpg
├──img1.jpg
...........
├── 2/
.....
In order to mimic real life images of Bengali digits, we generated the images with white, yellow, sky blue and teal colored backgrounds and used augmentor library to apply the following augmentations on both the train and test dataset:
- gaussian_noise(probability=0.3, mean=0, std=20.0 )
- blur(probability=0.6, blur_type='random', radius=(0, 100), fixed_radius=3)
- random_filter(probability=0.9, filter_type='random', size=5)
- black_and_white(probability = 1, threshold = 128)
- rotate(probability=0.3, max_left_rotation=25, max_right_rotation=25)
- rotate90(probability=0.005)
- rotate270(probability=0.005)
- zoom(probability=0.1, min_factor=1.01, max_factor=1.03)
- skew_tilt(probability=0.01, magnitude=1)
- skew_left_right(probability=0.02, magnitude=1)
- skew_top_bottom(probability=0.03, magnitude=1)
- skew_corner(probability=0.03, magnitude=1)
- skew(probability=0.01, magnitude=1)
- random_erasing(probability=0.01, rectangle_area=0.11)
- random_brightness(probability=0.5, min_factor=0.5, max_factor=1.5)
- random_color(probability=0.2, min_factor=0, max_factor=1)
- random_contrast(probability=0.3, min_factor=0.4, max_factor=1)
- invert(probability=0.09)
- resize(probability=1, width=256, height=256)
For further details on individual augmentation operation, please checkout the documentation of augmentor libarary.
- Ubunutu >= 18.04 Bionic Beaver
- Install requirements via:
pip install -r requirements.txt
- Make sure you have CUDA 10.1 and compatible CuDNN installed. To install tensorflow 2.0, run:
pip install tensorflow-gpu==2.0.0-rc0
Image Generation and Augmentation
------------------------------------------------------------------------------------------------------
Folder Structure
------------------------------------------------------------------------------------------------------
.
├── bfonts/
├── digit_generation_src/
├── directory_generation_check_font.py [Making necessary directories to contain the dataset]
├── digit_generation.py [Generating the actual digits]
├── image_augmentation.py [Augmenting the images]
├── mixing_aug_image_with_gen_image.py [Mixing the augmented images with the generated images]
├── digit_gen_main.py [Script containing the main function]
To run the digit generation and augmentation pipeline,
- Make a folder name
custom
in the path/usr/share/fonts/truetype
via the following command:sudo mkdir /usr/share/fonts/truetype/custom
- Copy the fonts from the
bfonts
folder to/usr/share/fonts/truetype/custom
path via:sudo cp -r ./bfonts/. /usr/share/fonts/truetype/custom
- Run the
digit_gen_main.py
file to generate, augment and prepare and the images in their corresponding folders.python digit_gen_main.py
Training and Validating a Baseline CNN model
------------------------------------------------------------------------------------------------------
Folder Structure
------------------------------------------------------------------------------------------------------
.
├── cnn_src/
├── train.py
├── test.py
├── test.png
├── model/
├── prinumco_mobilenet.h5
├──results/
├── acc.png
├── loss.py
├── test.png
We used tensorflow 2.0's keras API to construct and train mobilenetV2 architecture to provide a baseline CNN model for benchmarking purposes.
- Put the dataset folder in the root folder (In case you haven't generated the images yourself)
- Run the
train.py
file to train the baseline model - For testing out the model, run the
test.py
file (This will load our pretrained model to predict the class of a sampletest.png
image)