IEEE Signal Processing Letters, 2021.
Abstract: Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods address this problem, they rely on modified feature extractors and thus necessitate retraining. We instead propose a method that can be applied on a pretrained classifier. Our method exploits a fidelity map estimate that is fused into the internal representations of the feature extractor, thereby guiding the attention of the network and making it more robust to noisy data. We improve the noisy-image classification (NIC) results by significantly large margins, especially at high noise levels, and come close to the fully retrained approaches. Furthermore, as proof of concept, we show that when using our oracle fidelity map we even outperform the fully retrained methods, whether trained on noisy or restored images.
- Degradation Model
- Requirements
- Model Training and Testing
- Baseline Methods and Ablation Study
- Results
- Citation
To explore the effects of degradation types and levels on classification networks, we also implement five types of degradation models: Additive white Gaussian noise (AWGN), Salt and Pepper Noise, Gaussian Blur, Motion Blur and Rectangle Crop. The instructions for those degradatin models are given in this notebook.
- Python 3.7, PyTorch 2.1.0;
- Other common packages listed in
requirements.txt
orenvironment.yml
.
For the DnCNN denoiser, the parameter initialization follows He et al.. We change the loss function of the original paper to as it achieves better convergence performance. To train the classification networks, we fine-tune models pretrained on the ImageNet dataset. The fully connected layers are modified to fit the number of classes of each dataset (i.e. 257 for Caltech-256). We adopt the same initialization as He et al., i.e., the Xavier algorithm, and the biases are initialized to 0. We use the NAG descent optimizer with an initial learning rate of 0.001, and 120 training epochs. We also introduce a batch-step linear learning rate warmup for the first 5 epochs and a cosine learning rate decay, and apply label smoothing with . We select the model with the highest accuracy on the validation set.
The implemention of classification networks is taken from torchvision, and the restoration networks are based on DnCNN, MemNet.
-
To obtain pretrained classification networks:
python train.py --task classification --classification resnet50 --dataset caltech256 --num_class 257
- The
--classification
argument takes value in'resnet50', 'resnet18', 'alexnet', 'googlenet', 'vgg'
; - The
--dataset
and--num_class
takes value in'caltech256', 257
and'caltech101', 101
respectively.
- The
-
To obtain pretrained restoration networks:
python train.py --task=restoration --degradation=awgn --restoration=dncnn --level 0 0.5 --batch_size 256
- The
--restoration
argument takes value in'dncnn', 'memnet'
.
- The
-
To obtain retrained classification networks on degraded images:
python train.py --task classification --classification resnet50 --degradation awgn --level 0 0.1 0.2 0.3 0.4 0.5
-
To obtain retrained classification networks on restored images:
python train.py --task classification --classification resnet50 --degradation awgn --level 0 0.1 0.2 0.3 0.4 0.5 --restoration dncnn
-
To obtain our pretrained fidelity map estimator:
python train.py --task fidelity --degradation awgn --restoration dncnn --level 0 0.5 --fidelity_input degraded --fidelity_output l1 --batch_size 256 --num_epochs 60
- The
--fidelity_input
argument takes value in'degraded', 'restored'
; - The
--fidelity_output
argument takes value in'l1', 'l2', 'cos'
.
- The
-
To train the proposed model:
python train.py --task model --mode oracle --classification resnet50 --degradation awgn --restoration dncnn --level 0 0.1 0.2 0.3 0.4 0.5 --fidelity_input degraded --fidelity_output l1 --num_epochs 60 --dataset caltech256 --num_class 257
- The
--mode
argument takes value in'endtoend-pretrain', 'pretrain', 'oracle'
- The
-
To test the proposed model:
python test.py --task model --mode oracle --classification resnet50 --degradation awgn --level 0.1 --restoration dncnn --fidelity_input degraded --fidelity_output l1 --is_ensemble True
- The
--is_ensemble
argument takes value in'True', 'False'
- The
-
We provide four baseline methods for a comprehensive analysis. To train and test the baseline methods:
- WaveCNet
- train:
python train.py --task wavecnet --classification resnet50
; - test:
python test.py --task wavecnet --classification resnet50 --degradation awgn --level 0.1
;
- train:
- DeepCorrect
- train:
python train.py --task deepcorrect --classification resnet50 --degradation awgn --level 0 0.1 0.2 0.3 0.4 0.5 --num_epochs 60
; - test:
python test.py --task deepcorrect --classification resnet50 --degradation awgn --level 0.1
.
- train:
- WaveCNet
-
We also provide some in-depth analysis and ablation study models:
- To try different fidelty map inputs and outputs, you can use the
--fidelity_input
and--fidelity_output
arguments; - To try different downsampling methods, you can use the
--downsample
argument which takes value in'bicubic', 'bilinear', 'nearest'
; - For ablation study, you can use the
--ablation
argument which takes value in'spatialmultiplication' 'residualmechanism' 'spatialaddition' 'channelmultiplication' 'channelconcatenation'
; - Note: For more details on the ablation study models, please refer to our paper.
- To try different fidelty map inputs and outputs, you can use the
Aside from the results in our main paper and supplementary material, we also illustrate the performance of the proposed method on other classification (e.g. AlexNet in the figure below on the left) and restoration networks (e.g. MemNet in the figure below on the right). The performance of the proposed method on other networks parallels that on ResNet-50 and DnCNN in our paper. This shows that the proposed method is model-agnostic and can be used on other networks.
The above figure on the left: Classification results with the AlexNet classification network and the DnCNN restoration network, on the Caltech-256 dataset, for various setups. The solid lines indicate testing directly on noisy images. The dashed lines indicate testing with the DnCNN restoration preprocessing step.
The above figure on the right: Classification results with the ResNet-50 classification network and the MemNet restoration network, on the Caltech-256 dataset, for various setups. The solid lines indicate testing directly on noisy images. The dashed lines indicate testing with the MemNet restoration preprocessing step.
The CUB-200-2011 dataset is an image dataset of 200 bird species. There are 5994 training images and 5794 test images. We randomly chose 20 percent of the training set for validation. The results are given in the table below.
Methods | Experimental results |
Uniform degradation (sigma) | ||||
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | ||
Pretrained | Test on noisy | 34.89 | 08.11 | 02.02 | 00.89 | 00.70 |
Test on restored | 56.77 | 42.37 | 30.97 | 23.15 | 16.91 | |
Retrain on noisy |
Test on noisy | 59.91 | 55.86 | 51.41 | 46.94 | 42.09 |
Test on restored | 58.37 | 52.56 | 44.97 | 37.76 | 31.33 | |
Retrain on restored |
Test on noisy | 52.53 | 24.46 | 07.18 | 01.86 | 00.85 |
Test on restored | 63.34 | 59.51 | 54.76 | 49.83 | 44.63 | |
FG-NIC (Pretrained) |
Single | 63.75 | 56.98 | 48.87 | 40.55 | 32.82 |
Ensemble | 64.95 | 57.37 | 48.74 | 40.33 | 32.38 | |
FG-NIC (Oracle) |
Single | 65.10 | 60.21 | 55.26 | 50.77 | 46.10 |
Ensemble | 65.75 | 60.95 | 55.75 | 51.15 | 46.32 |
@article{lin2021fidelity,
title={Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks},
author={Xiaoyu Lin and Deblina Bhattacharjee and Majed El Helou and Sabine Süsstrunk},
journal={IEEE Signal Processing Letters},
year={2021},
publisher={IEEE}
}