Summer Challenge | Writer Verfication

National Conference on Computer Vision, Pattern Recognition, Image Processing, and Graphics (NCVPRIPG'23) Link

Index

Model Pipeline
Installation & Requirements
Usage
Authors
Acknowledgment

Model Pipeline:

Data preprocessing:

Segment the images using the CRAFT algorithm.
Binarize the images.
Invert the images (black text on white background).

Feature extraction & Model Training

We add a linear layer with 1352 nodes to the tail of the Mobilenet architecture and train it as a classification problem. Then we throw away the tail and then use the distances between the 1000-dimensional features to classify the validation dataset using the nearest neighbour search.

Evaluating the model based on ROC and AUC:

Installation

Python Libraries used:

PIL
CV2 (OpenCV)
Pytorch and torchvision
Matplotlib
Sklearn
Other than that, we use the Tesseract OCR engine. Installing the engine for the respective OS(with language support of Hindi) and Python library Pytesseract is required.

* You can download all the required libraries using the below Python code:

import json
import subprocess

# Read the JSON file
with open('dependencies.json', 'r') as file:
  data = json.load(file)

# Install the libraries
for library in data['dependencies']:
  subprocess.check_call(['pip', 'install', library])

Usage

Place the training dataset in the dataset folder.
Download the CRAFT model from the official repository or from the [CRAFT Link] and edit the model_path variable in the preprocess.py file.
Run the script preprocess.py . This will create a new folder preprocessedTrain with binarised and cropped images inside the dataset folder.
Run all the cells in the train.ipynb file. Be sure to change the path of testdata folder based on its location in the system. Also in windows the forward slashes have to be changed to backslashes.
The model pth file will be saved in the local directory.
The Eval script has to be run on colab. You can run all the cells, or only calculate distances between test images and then predict using the saved knn file.
Before running the colab script, download the dataset.zip file in your DRIVE folder.The pth file, semi-test.zip and test.csv files are to be placed inside an ncv folder.

Link

Authors

1. Anushkaa Ambuj 2. Anupama Birman 3. Shreyas Vaidya

Acknowledgemnets

Hafemann et.al, Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks.
[Official implementation of Character Region Awareness for Text Detection (CRAFT)](https://github.com/clovaai/CRAFT-pytorch)

Shreyasvaidya/Open-CV-Summer-Challenge