UDiNet: a convolution based solution for OCR preprocessing

Overview

This project implements a deep learning-based approach to remove various types of noise from images, specifically designed as a preprocessing step for Optical Character Recognition (OCR) systems. The objective is to enhance the accuracy of OCR by reducing noise interference in images affected by uniform noise and flipping effects.

Features
Technologies
Methodology
Contributing
Citation

Features

Denoising Models: Implements dilated U-Net and GAN architectures to remove noise from images effectively.
Noise Types Supported:
- Uniform noise, which is characterized by random variations in pixel intensity.
- Flip effect, where images may appear horizontally inverted.
Preprocessing for OCR: Enhances image quality for improved OCR performance by removing noise and artifacts.
Flexible Architecture: Both U-Net and GAN models are implemented, allowing for comparative analysis of their denoising capabilities.

Technologies

Programming Language: Python
Deep Learning Frameworks: TensorFlow and Keras are used for building and training neural networks.
Image Processing: Numpy and Scipy are used to process raw images.
Plotting: Matplotlib for visualization of results.

Getting Started

Methodology

Dataset Preparation

For this study, we created a dataset by gathering a diverse collection of PDFs available online. From these documents, we manually curated a total of 1,500 book pages. To increase the dataset's utility for model training, we artificially added various types of synthetic noise to the images, including uniform noise, flip effects, and intensity noise. Each noisy image was carefully paired with its corresponding clean ground truth. This strategy effectively simulates the real-world imperfections typically found in document images, providing a solid foundation for assessing the models' performance in dealing with different noise scenarios.

Running the Models

The project includes scripts for training both the U-Net and GAN models. By executing these scripts, you can initiate the training process with your dataset.

U-Net Model

U-Net is a convolutional neural network architecture specifically designed for image segmentation tasks. In this project, it is adapted for denoising by utilizing skip connections to preserve spatial information. This figure demonstrates the proposed architecture. The purple layers are classic convolutional layers while green and blue blocks are dilated convs and concatenation layers.

GAN Model

Generative Adversarial Networks (GANs) consist of two neural networks, a generator, and a discriminator, that are trained simultaneously. The generator creates denoised images, while the discriminator evaluates them against clean images. The adversarial training allows the generator to improve its performance over time. This is a sample of GAN's output

Loss Functions

The loss function is a weighted summation of MSE and MAE of ground truth in U-net. In the GAN part, the loss function is generated by the discriminator and its main responsibility is to identify noisy and noise-free images.

Evaluation

It is clear that our model boosts the performance of OCR by referring to this table as the result of Tesseract and Gnome OCRs.

Citation

No contribution is allowed on this repository.

MahyarFardin/UDiNet