/Bangla-OCR

Bangla handwriting recognition utilizing BanglaWriting dataset.

Primary LanguageJupyter Notebook

Bangla OCR

Dataset Description:

BanglaWriting

Process

Preprocessing

The dataset is not processed and it needs further preprocessing. From the raw image folder the word images have been extracted using the provided json file. During the extraction process the cropped images are binarized using Otsu’s Binarization technique. The filename follows the configuration below.

"পরিবার 18__225_15_1.jpg" as "label wordNumberOfThePage__uniquePersonNumber_age_gender.extension"

Model

  • CRNN = CNN + BiDirectional GRU

Loss Function

  • CTC Loss

Optimizer

  • Adam

Usage

Download the dataset from the provided link and unzip the "raw" file in the current directory and run

python generator.py

Finally, run the notebook.

Requirements

  • python==3.7.0
  • numpy=1.16.0
  • scikit-learn==0.23.2
  • opencv-python==4.4.0.46
  • torch==1.7.0
  • tqdm==4.53.0

Further Improvement can be done through:

  • Preprocessing such as skew correction, noise removal, thinning and skeletonization
  • Gathering and/or generating synthetic data
  • Making the dataset balanced
  • Using Focal CTC loss to overcome class imbalance problem
  • Using Edit distance to predict neareast word
  • Using better optimizer such as RAdam

References

  1. Handwriting to Text Conversion using Time Distributed CNN and LSTM with CTC Loss Function

  2. Use PyTorch’s DataLoader with Variable Length Sequences for LSTM/GRU

  3. Data Preparation for Variable Length Input Sequences

  4. Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)

  5. Image Text Recognition

  6. Sequence Modeling: Recurrentand Recursive Nets