[RE] Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

This repository is the PyTorch and PyTorch Lightning implementation of the paper "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation". It is well documented version of the original repository with the code flow available here. The paper address the problem of weakly-supervised semantic segmentation (WSSS) using bounding box annotations by proposing two novel methods:

Background-aware pooling (BAP), to extract high-quality pseudo segmentation labels
Noise-aware Loss (NAL), to make the networks less susceptible to incorrect labels

Visual comparison of pseudo ground-truth labels

For more information, please checkout the project site [website].

Requirements

The code is developed and tested using Python >= 3.6. To install the requirements:

pip install -r requirements.txt

To setup the dataset:

VOC:

bash data/setup_voc.bash /path-to-data-directory

To generate background masks:

VOC:

python3 utils/voc_bbox_setup.py --data-root /path-to-data-directory

COCO:

python3 utils/coco_bbox_setup.py --data-root /path-to-data-directory

Once finished, the folder data should be like this: VOC:

    data   
    └── VOCdevkit
        └── VOC2012
            ├── JPEGImages
            ├── SegmentationClassAug
            ├── Annotations
            ├── ImageSets
            ├── BgMaskfromBoxes
            └── Generation
                ├── Y_crf
                └── Y_ret

Training

The training procedure is divided into 3 stages and example commands for each have been given below. Hyperparameters can be adjusted accordingly in the corresponding configuration files.

Stage 1: Training the classification network

Change the MODEL.GAP parameter in config file to train the model:

True: With the Global Average Pooling method
False: With the proposed Background Aware Pooling method

By default, all the models are trained using Augmented PASCAL VOC containing 10,582 images and can be trained using Non-Augmented dataset by changing the DATA.AUG parameter to False.

python3 stage1.py --config-file configs/stage1.yml --gpu-id 0

Stage 2: Generating pseudo labels

python3 stage2.py --config-file configs/stage2.yml --gpu-id 0

Stage 3: Training a CNN using the pseudo labels

DeepLab Large FOV (VGG Backbone):

python3 stage3.py --config-file configs/stage3_vgg.yml --gpu-id 0

DeepLab ASPP (Resnet Backbone):

python3 stage3.py --config-file configs/stage3_res.yml --gpu-id 0

Change the MODEL.LOSS parameter in the corresponding config file to train the model:

NAL : With the proposed Noise Aware Loss using Ycrf and Yret
ER : With the Entropy Regularization Loss using Ycrf and Yret
BS : With the Bootstraping Loss using Ycrf and Yret
BASELINE : With the CE Loss using only the relaible region obtained by Ycrf and Yret
CE_CRF: With the Cross Entropy Loss using Ycrf
CE_RET: With the Cross Entropy Loss using Yret

VOC to COCO psuedo label Generation

Change the DATA.ROOT and MODEL.NUM_CLASSES in the Stage2 config file for psuedolabel generation on MS COCO 2017 train dataset

python3 stage2_voc_to_coco.py --config-file configs/stage2.yml --gpu-id 0

Evaluation

To evaluate the model on the validation set of Pascal VOC 2012 dataset before and after Dense CRF processing change the DATA.MODE parameter to val in the corresponding config file:

python3 stage3.py --config-file configs/stage3_vgg.yml --gpu-id 0

DeepLab ASPP (Resnet Backbone):

python3 stage3.py --config-file configs/stage3_res.yml --gpu-id 0

Evaluation would be performed on raw validation set images to obtain the Mean Accuracy and IOU metrics pre and post-Dense CRF processing.

Pre-trained Models and Pseudo Labels

Pretrained models: Link
Pseudo Labels: Link

Quantitative Results

We achieve the following results:

Comparison of pseudo labels on the PASCAL VOC 2012 validation set in terms of mIoU

Method	Original Author's Results	Our Results
GAP	76.1	75.5
BAP Ycrf w/o u0	77.8	77
BAP Ycrf	79.2	78.8
BAP Yret	69.9	69.9
BAP Ycrf & Yret	68.2	72.7

Comparison of mIoU scores using different losses on the PASCAL VOC 2012 training set. We provide both mIoU scores before/after applying DenseCRF

Method	Original Author's Results	Our Results
Baseline	61.8 / 67.5	60.9 / 64.5
w/ Entropy Regularization	61.4 / 67.3	60.8 / 64.1
w/ Bootstrapping	61.9 / 67.6	60.9 / 64.6
w/ Lwce	62.4 / 68.1	61.4 / 64.8

Quantitative comparison using DeepLab-V1 (VGG-16) on the PASCAL VOC 2012 dataset in terms of mIoU
- Weakly supervised learning
Method Original Author's Results Our Results

w/ Ycrf 67.8 64.7

w/ Yret 66.1 62.8

w/ NAL 68.1 64.8

w/ NAL (test) 69.4 65.6
Quantitative comparison using DeepLab-V2 (ResNet-101) on the PASCAL VOC 2012 dataset in terms of mIoU
- Weakly supervised learning
Method Original Author's Results Our Results

w/ Ycrf 74.0 67.0

w/ Yret 72.4 70.2

w/ NAL 74.6 70.8

w/ NAL 76.1 71.7

Method	Original Author's Results	Our Results
w/ Ycrf	67.8	64.7
w/ Yret	66.1	62.8
w/ NAL	68.1	64.8
w/ NAL (test)	69.4	65.6

Method	Original Author's Results	Our Results
w/ Ycrf	74.0	67.0
w/ Yret	72.4	70.2
w/ NAL	74.6	70.8
w/ NAL	76.1	71.7

Quantitative comparison of psuedo lables on the MS-COCO train set for model trained on Pascal VOC

Results on COCO train	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
BAP: Ycrf (Authors)	11.7	28.7	8.0	3.0	15.0	27.1
BAP: Ycrf (Ours)	8.6	20.1	6.5	1.9	8.8	15.9
BAP: Yret (Authors)	9.0	30.1	2.8	4.4	10.2	16.2
BAP: Yret (Ours)	6.6	20.2	2.5	3.3	5.7	10.6

Comparison of pseudo labels on the PASCAL VOC 2012 validation set in terms of mIoU for different values of Grid size

Grid Size 1 2 3

1 75.82 75.77 75.65

2 76.11 76.10 75.15

3 75.87 75.78 75.81

4 78.83 78.72 78.82

5 74.16 74.07 74.02

Grid Size	1	2	3
1	75.82	75.77	75.65
2	76.11	76.10	75.15
3	75.87	75.78	75.81
4	78.83	78.72	78.82
5	74.16	74.07	74.02

Quantitative comparison using DeepLab-V1 (VGG-16) on the PASCAL VOC 2012 dataset in terms of mIoU with different loss functions

Method	Authors’ Results	Our Results
Baseline	61.8 / 67.5	60.9 / 64.5
w / Entropy Regularization	61.4 / 67.3	60.8 / 64.1
w / Bootstrapping	61.9 / 67.6	60.9 / 64.6
w / Lwce	62.4 / 68.1	61.4 / 64.8

Qualitative Results

Pseudo Labels

Input Image	Ground Truth	Y CRF	Y RET

Contributors

Aryan Mehta, Karan Uppal, Kaushal Jadhav, Monish Natarajan and Mradul Agrawal

This repository is maintained by AGV.AI (IIT Kharagpur)

Bibtex

[RE] Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

@article{Mehta:2022,
  title = {{[Re] Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation}},
  author = {Mehta, Aryan and Uppal, Karan and Jadhav, Kaushal and Natarajan, Monish and Agrawal, Mradul and Chakravarty, Debashish},
  journal = {ReScience C},
  year = {2022},
  doi = {10.5281/zenodo.6574677},
}

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

@inproceedings{oh2021background,
  title     = {Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation},
  author    = {Oh, Youngmin and Kim, Beomjun and Ham, Bumsub},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2021},
}

Acknowledgments

PASCAL VOC 2012 Setup adopted from deeplab-pytorch

karan-uppal3/BANA

[RE] Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

Requirements

To setup the dataset:

To generate background masks:

Training

Stage 1: Training the classification network

Stage 2: Generating pseudo labels

Stage 3: Training a CNN using the pseudo labels

VOC to COCO psuedo label Generation

Evaluation

Pre-trained Models and Pseudo Labels

Quantitative Results

Qualitative Results

Pseudo Labels

Contributors

Bibtex

Acknowledgments