/DHR

[ECCV 2024] DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

Primary LanguageDockerfile

PWC PWC PWC PWC PWC PWC

DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

This repository is the official implementation of "DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation".

arXiv

Update

[07/02/2024] Our DHR has been accepted to ECCV 2024. 🔥🔥🔥

[04/02/2024] Released initial commits.

Citation

Please cite our paper if the code is helpful to your research.

@inproceedings{jo2024dhr,
      title={DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation}, 
      author={Sanghyun Jo and Fei Pan and In-Jae Yu and Kyungsu Kim},
      booktitle={European Conference on Computer Vision (ECCV)},
      year={2024}
}

Abstract

Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion methods like Random Walk. We first address this by employing unsupervised and weakly-supervised feature maps instead of conventional methodologies, allowing for hierarchical mask enhancement. This method distinctly categorizes higher-level classes and subsequently separates their associated lower-level classes, ensuring all classes are correctly restored in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8%, COCO: 53.9%, Context: 49.0%, ADE: 32.9%, Stuff: 37.4%), reducing the gap with fully supervised methods by over 84% on the VOC validation set.

Overview

Setup

Setting up for this project involves installing dependencies and preparing datasets. The code is tested on Ubuntu 20.04 with NVIDIA GPUs and CUDA installed.

Installing dependencies

To install all dependencies, please run the following:

pip install -U "ray[default]"
pip install git+https://github.com/lucasb-eyer/pydensecrf.git
python3 -m pip install -r requirements.txt

or reproduce our results using docker.

docker build -t dhr_pytorch:v1.13.1 .
docker run --gpus all -it --rm \
--shm-size 32G --volume="$(pwd):$(pwd)" --workdir="$(pwd)" \
dhr_pytorch:v1.13.1

Preparing datasets

Please download following VOC, COCO, Context, ADE, and COCO-Stuff datasets. Each dataset has a different directory structure. Therefore, we modify directory structures of all datasets for a comfortable implementation.

1. PASCAL VOC 2012

Download PASCAL VOC 2012 dataset from our [Google Drive].

2. MS COCO 2014

Download MS COCO 2014 dataset from our [Google Drive].

3. Pascal Context

Download Pascal Context dataset from our [Google Drive].

4. ADE 2016

Download ADE 2016 dataset from our [Google Drive].

5. COCO-Stuff

Download COCO-Stuff dataset from our [Google Drive].

6. Open-vocabulary Segmentation Models

Download [all results] and [the reproduced project] for a fair comparison with WSS.

Create a directory "../VOC2012/" for storing the dataset and appropriately place each dataset to have the following directory structure.

    ../                               # parent directory
    ├── ./                            # current (project) directory
    │   ├── core/                     # (dir.) implementation of our DHR (e.g., OT)
    │   ├── tools/                    # (dir.) helper functions
    │   ├── experiments/              # (dir.) checkpoints and WSS masks
    │   ├── README.md                 # instruction for a reproduction
    │   └── ... some python files ...
    │
    ├── WSS/                          # WSS masks across all training and testing datasets
    │   ├── VOC2012/          
    │   │   ├── RSEPM/        
    │   │   ├── MARS/
    │   │   └── DHR/
    │   ├── COCO2014/
    │   │   └── DHR/
    │   ├── PascalContext/
    │   │   └── DHR/
    │   ├── ADE2016/   
    │   │   └── DHR/
    │   └── COCO-Stuff/
    │       └── DHR/
    │
    ├── GroundingDINO_Ferret_SAM/     # reproduced project for Grounding DINO and Ferret with SAM
    │   ├── core/                     # (dir.) implementation details
    │   ├── tools/                    # (dir.) helper functions
    │   ├── weights/                  # (dir.) checkpoints of Grounding DINO and Ferret
    │   ├── README.md                 # instruction for implementing Grounding DINO and Ferret
    │   └── ... some python files ...
    │
    ├── OVSeg/                        # SAM-based outputs of Grounding DINO and Ferret for a fair comparison
    │   ├── VOC2012/      
    │   │   ├── GroundingDINO+SAM/
    │   │   └── Ferret+SAM/
    │   ├── COCO2014/
    │   │   ├── GroundingDINO+SAM/
    │   │   └── Ferret+SAM/
    │   ├── PascalContext/
    │   │   ├── GroundingDINO+SAM/
    │   │   └── Ferret+SAM/
    │   ├── ADE2016/   
    │   │   ├── GroundingDINO+SAM/
    │   │   └── Ferret+SAM/
    │   └── COCO-Stuff/
    │       ├── GroundingDINO+SAM/
    │       └── Ferret+SAM/
    │
    ├── VOC2012/                      # PASCAL VOC 2012
    │   ├── train_aug/
    │   │   ├── image/     
    │   │   ├── mask/        
    │   │   └── xml/   
    │   ├── validation/
    │   │   ├── image/     
    │   │   ├── mask/        
    │   │   └── xml/   
    │   └── test/
    │       └── image/
    │
    ├── COCO2014/                     # MS COCO 2014
    │   ├── train/              
    │   │   ├── image/     
    │   │   ├── mask/        
    │   │   └── xml/
    │   └── validation/
    │       ├── image/     
    │       ├── mask/        
    │       └── xml/
    │
    ├── PascalContext/                # PascalContext
    │   ├── train/              
    │   │   ├── image/     
    │   │   ├── mask/        
    │   │   └── xml/
    │   └── validation/
    │       ├── image/     
    │       ├── mask/        
    │       └── xml/
    │
    ├── ADE2016/                      # ADE2016
    │   ├── train/              
    │   │   ├── image/     
    │   │   ├── mask/        
    │   │   └── xml/
    │   └── validation/
    │       ├── image/     
    │       ├── mask/        
    │       └── xml/
    │
    └── COCO-Stuff/                   # COCO-Stuff
        ├── train/              
        │   ├── image/     
        │   ├── mask/        
        │   └── xml/
        └── validation/
            ├── image/     
            ├── mask/        
            └── xml/

Preprocessing

1. Training the USS method

Please download the trained CAUSE weights from scratch on other datasets CAUSE weights. We follow the official CAUSE to train CAUSE from scratch on five datasets.

2. Training the WSS method

Please download and prepare WSS masks WSS labels. You can replace existing WSS methods with other WSS methods following the current structure.

Training

Our code is coming soon.

Evaluation

Release our checkpoint and official VOC results (anonymous links).

Method Backbone Checkpoints VOC val VOC test
DHR ResNet-101 Google Drive link link

Below lines are testing commands to reproduce our results. Additionally, we follow the official Mask2Former to train Swin-L+Mask2Former with our DHR masks on five datasets.

# Generate the final segmentation outputs with CRF
python3 produce_wss_masks.py --gpus 0 --cpus 64 --root ../ --data VOC2012 --domain validation \
--backbone resnet101 --decoder deeplabv3+ --tag "ResNet-101@VOC2012@DeepLabv3+@DHR" --checkpoint "last"

# Calculate the mIoU
python3 evaluate.py --fix --data VOC2012 --gt ../VOC2012/validation/mask/ \
--tag "DHR" --pred "./experiments/results/VOC2012/ResNet-101@VOC2012@DeepLabv3+@DHR@last/validation/"

# Reproduce WSS performance related to official VOC results
#            DHR (Ours, DeepLabv3+) | mIoU: 79.6%, mFPR: 0.127, mFNR: 0.077
#           DHR (Ours, Mask2Former) | mIoU: 81.7%, mFPR: 0.131, mFNR: 0.052
python3 evaluate.py --fix --data VOC2012 --gt ../VOC2012/validation/mask/ \
--tag "DHR (Ours, DeepLabv3+)" --pred "./submissions_DHR@DeepLabv3+/validation/results/VOC2012/Segmentation/comp5_val_cls/"
python3 evaluate.py --fix --data VOC2012 --gt ../VOC2012/validation/mask/ \
--tag "DHR (Ours, Mask2Former)" --pred "./submissions_DHR@Mask2Former/validation/results/VOC2012/Segmentation/comp5_val_cls/"