This repository implements an unsupervised image enhancement algorithm combining Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs). Designed primarily for dental imaging applications, the algorithm addresses challenges such as inconsistent lighting, noise, and fine detail enhancement, producing high-quality outputs without the need for paired training datasets. This algorithm is based on the idea of the Towards unsupervised deep image enhancement with generative adversarial network. The Neural Network Structure and Loss Function are optimized to improve the performance.
The generator transforms low-quality images into high-quality images through several key components:
- Initial Layers: The input image is processed through a series of convolutional layers to extract features. Each convolutional layer consists of filters, activation functions (LeakyReLU), and batch normalization for efficient learning.
- Objective: Captures global image properties, such as lighting and color distribution, ensuring overall consistency in enhanced images.
- Structure:
- Initial Convolution Layers: Applies convolution with filters of size 7x7 and 5x5, extracting global features from the input feature map.
- Combination Operations: Combines outputs through concatenation operations and additional convolutions.
- Activation Functions: LeakyReLU activation is used after each convolutional step to introduce non-linearity.
- Skip Connections: Incorporates residual connections, which add the input to the output to promote gradient flow and preserve key features throughout the layers.
- Objective: Focuses on enhancing fine-grained details by attending to specific regions in the image using channel and spatial attention mechanisms.
- Channel Attention Module (CAM):
- Description: Emphasizes important channels (features) by computing and applying weights based on global pooling operations (average and max).
- Steps:
- Global Average and Max Pooling on input features.
- Passing pooled outputs through shared multi-layer perceptrons (MLPs) to compute attention weights.
- Multiplication of attention weights with original features to highlight essential channels.
- Spatial Attention Module (SAM):
- Description: Focuses on key spatial regions within the feature maps.
- Steps:
- Pooling (average and max) along the channel dimension.
- Concatenation of pooled outputs, followed by convolution with a 7x7 kernel to generate spatial attention weights.
- Multiplication of the resulting weights with input features for spatial refinement.
- Objective: Upsample feature maps back to the original image size after multiple convolution and attention operations.
- Structure:
- Transposed Convolutions (Deconvolutions): Used to increase the spatial dimensions of the feature maps progressively.
- Process: Each deconvolution layer increases the resolution by using transposed convolution operations with specified kernel sizes and strides to ensure the output matches the input image dimensions.
- Activation and Normalization: The outputs are further processed using activation functions (LeakyReLU) and normalization layers to maintain stability during training.
The discriminator is a multi-scale convolutional network designed to:
- Differentiate between real and generated images.
- Operate on multiple scales to capture global and local inconsistencies.
- Encourage the generator to produce highly realistic images by providing granular feedback.
- Quality Loss: Guides the generator to align the output distribution with high-quality images.
- Consistency Loss: Ensures that the enhanced images retain core content features from the input image.
- Identity Loss: Preserves input properties for high-quality images during enhancement, avoiding unnecessary alterations.
The total generator loss function is a weighted sum of these components, promoting both visual fidelity and structural consistency.
The unsupervised image enhancement algorithm presented in this repository offers several notable advantages, including innovative architectural features and superior performance metrics when compared to existing state-of-the-art algorithms. Here are the key benefits and comparative results:
-
Unsupervised Training Paradigm: Unlike many existing methods that require paired data (i.e., low-quality and high-quality image pairs), our algorithm operates in an unsupervised setting. This eliminates the need for precise, matched datasets, which are often challenging and expensive to acquire in fields such as dental imaging.
-
Hybrid CNN and GAN Approach with Attention Mechanisms:
- The combination of Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GANs) allows the model to effectively learn image transformations and adversarial dynamics for high-fidelity results.
- The Global Attention Module (GAM) ensures that the generator can capture overall image characteristics, such as lighting and color balance, before making detailed adjustments.
- The Local Feature Enhancement (LFE) module, inspired by CBAM attention mechanisms, allows for fine-grained control over detail enhancement by emphasizing spatial and channel-wise attention.
-
Performance on Benchmark Datasets:
- The algorithm was tested on the MIT-Adobe FiveK dataset and demonstrated substantial improvements over state-of-the-art methods such as CycleGAN and DnCNN.
- Quantitative metrics used for comparison:
- Peak Signal-to-Noise Ratio (PSNR): Our model achieved an average PSNR improvement of 2.5 dB over CycleGAN and 1.8 dB over DnCNN on tested images, indicating superior image quality with less distortion.
- Structural Similarity Index (SSIM): Our approach attained a significant increase in SSIM scores compared to competing algorithms, reflecting better structural preservation and perceptual quality.
- Neural Image Assessment (NIMA): Our model consistently produced aesthetically pleasing images, achieving a higher NIMA score compared to other state-of-the-art methods, reflecting human perception-driven quality.
-
Robustness and Generalization:
- The architecture was shown to generalize effectively across different types of input images without requiring significant retraining or fine-tuning. This makes it adaptable for real-world scenarios where input images can vary substantially in quality and features.
-
High-Quality Outputs with Attention to Detail:
- The use of combined global and local attention mechanisms enables the model to enhance both large-scale and minute image features, ensuring visually compelling outputs that maintain global coherence while highlighting critical details.
- This dual attention strategy ensures that the model does not overfit to global characteristics or neglect important details, achieving an optimal balance.
-
Efficient Training Process:
- The model architecture, training schedule, and use of advanced optimization strategies (such as adaptive learning rates with the Adam optimizer) ensure faster convergence and stable training compared to many existing GAN-based approaches.
-
CycleGAN:
- PSNR: Our model outperforms CycleGAN by an average of 2.5 dB, producing significantly higher-quality reconstructions.
- SSIM: Demonstrates higher similarity to ground truth images with improved structural integrity.
- NIMA: Generates more aesthetically appealing images, as evidenced by higher scores.
-
DnCNN:
- Noise Removal and Detail Preservation: While DnCNN excels at removing noise, our approach surpasses it in preserving image structure and details, particularly in complex scenes and fine-grained content.
- PSNR and SSIM Improvements: Achieved higher PSNR and SSIM scores, confirming enhanced fidelity and reduced image distortion.
- Framework: PyTorch
- Training Details:
- Trained using the Adam optimizer with an initial learning rate of 0.0001.
- Learning rate decayed linearly after 150 epochs for stable convergence.
- The model was trained on the MIT-Adobe FiveK dataset.
- Evaluation Metrics:
- Peak Signal-to-Noise Ratio (PSNR)
- Structural Similarity Index (SSIM)
- Neural Image Assessment (NIMA)