Namkwangwoon/Saliency-Attention-based-DETR

SA-DETR: Saliency Attention-based DETR for Salienct Object Detection

PythonApache-2.0

Saliency Attention-based DETR

"SA-DETR: Saliency Attention-based DETR for Salienct Object Detection"

Overall Architecture

Prerequisite

PyTorch >=1.5.0
Requirements
```
pip install -r requirements.txt
```

Dataset

SOC dataset

Fan, Deng-Ping, et al. "Salient objects in clutter." IEEE Transactions on Pattern Analysis and Machine Intelligence 45.2 (2022): 2344-2366.

https://github.com/DengPingFan/SODBenchmark

Training

python main_SOC.py \
  --masks \
  --no_aux_loss \
  --output_dir "output_path" \
  --epochs 200 \
  --frozen_weights detr-r50-e632da11.pth (or --frozen_weights detr-r101-2c7b67e5.pth --backbone resnet101)
  [--resume "output_checkpoint_path" --lr "lr" --lr_drop "lr_drop"]

Inference

python main_SOC.py --masks --no_aux_loss --eval

Evaluation

python pred_SOC.py --masks --no_aux_loss --eval

MAE

Perazzi, Federico, et al. "Saliency filters: Contrast based filtering for salient region detection." 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012.

$N$ : Pixel numbers
$Sal$ : Saliency(Output) map
$G$ : GT map

S-measure

Fan, Deng-Ping, et al. "Structure-measure: A new way to evaluate foreground maps." Proceedings of the IEEE international conference on computer vision. 2017.

$\alpha$ : Balanced parameter, [0, 1], (0.5 default)
$S_o$ : Object-aware structural similarity
$S_r$ : Region-aware structure similarity

E-measure

Fan, Deng-Ping, et al. "Enhanced-alignment measure for binary foreground map evaluation." arXiv preprint arXiv:1805.10421 (2018).

$w, h$ : Width, height of map
$\phi_{FM}$ : Enhanced alignment matrix of forground map

Results

스크린샷 2024-02-01 오후 9 44 51

스크린샷 2024-02-01 오후 9 46 11

Ablation Studied

Ablation studies of Saliency Module(SM)

Objects

Without SM, salient objects are not detected, or other objects are detected as salient.

Attention maps & Object-level masks

With SM, each attention map recognizes the shape of an object well, resulting in an accurate object-level mask.

Reference Codes