/neural-deskew

toolkit for learning efficient document image skew estimation (DISE)

Primary LanguagePythonMIT LicenseMIT

neural-deskew

A shallow multi-layer perceptron (MLP) for document image deskew on top of three classifical deskew algorithms.

Usage

TODO

Abstract

This project focuses on developing a neural network model for document image skew estimation using a Multi-Layer Perceptron (MLP) architecture and the Albumentations library for data augmentation. The goal is to accurately estimate the skew angle of document images.

Dataset

A custom dataset is prepared, comprising 2000 document images with associated ground truth skew angles. The dataset is split into 1500 images for training and validation, and 500 images for testing. Each image is processed to restore vertical alignment and enable robustness to different sizes, occlusions, rotations, and lighting conditions.

Model

The proposed MLP model takes three confidence vectors generated by different deskewing techniques as input. These vectors represent the likelihood of the document being rotated at various angles. The MLP processes these vectors and produces a unified confidence vector spanning the entire angle space. The model architecture includes convolutional layers to process the confidence vectors, followed by fully connected layers and dropout regularization to enhance generalization.

Training

The training process is managed using PyTorch Lightning, which simplifies the training loop and provides features such as early stopping. The PyTorch Lightning Trainer is configured with early stopping using a patience of three to prevent overfitting. The training progress and metrics are logged using the Weights & Biases (W&B) library, enabling comprehensive experiment tracking and visualization.

To train the model, a training.py script is provided. It takes arguments for the dataset directory, model configuration YAML file, training hyperparameters, and data split ratios. The script loads the data, initializes the model and Trainer, and begins the training process. Additionally, a run_training.sh script is available to launch training using default configurations.

Checkpoint

TODO

The model weights and architecture are checkpointed using the checkpoint

Configuration

The project includes configuration files config.yaml and model_config.yaml for easy customization of hyperparameters such as learning rate, batch size, hidden dimension, and number of epochs. These files allow seamless adaptation of the training process to specific requirements.

Resources