/DCANet

Deep Connected Attention Networks

Primary LanguagePython

Deep Connected Attention Networks (DCANet)

Illustration

Figure 1. Illustration of our DCANet. We visualize intermediate feature activation using Grad-CAM. Vanilla SE-ResNet50 varies its focus dramatically at different stages. In contrast, our DCA enhanced SE-ResNet50 progressively and recursively adjusts focus, and closely pays attention to the target object.

Approach

Figure 2. An overview of our Deep Connected Attention Network. We connect the output of transformation module in the previous attention block to the output of extraction module in current attention block. In the context of multiple attention dimensions, we connect attentions along each dimension. Here we show an example with two attention dimensions. It can be extended to more dimensions.

Implementation

In this repository, all the models are implemented by pytorch.

We use the standard data augmentation strategies with ResNet.

To reproduce our DCANet work, please refer Usage.md.

Trained Models

😊 All trained models and training log files are submitted to an anonymous Google Drive.

😊 We provide corresponding links in the "download" column.



Table 1. Single crop classification accuracy (%) on ImageNet validation set. We re-train models using the PyTorch framework and report results in the "re-implement" column. The corresponding DCANet variants are presented in the "DCANet" column. The best performances are marked as bold. "-" means no experiments since our DCA module is designed for enhancing attention blocks, which are not existent in base networks.
Re-Implement DCANet
Top1 Top5 Param(G) FLOPs Download Top1 Top5 Param(G) FLOPs Download
ResNet50 75.90 92.72 4.12 25.56M model log - - - - -
SE-ResNet50 77.29 93.65 4.13 28.09M model log 77.55 93.77 4.13 28.65M model log
SK-ResNet50 77.79 93.76 5.98 37.12M model log 77.94 93.90 5.98 37.48M model log
GEθ-ResNet50 76.24 92.98 4.13 25.56M model log 76.75 93.36 4.13 26.12M model log
GC-ResNet50 74.90 92.28 4.13 28.11M model log 75.42 92.47 4.13 28.63M model log
CBAM-ResNet50 77.28 93.60 4.14 28.09M model log 77.83 93.72 4.14 30.90M model log
Mnas1_0 71.72 90.32 0.33 4.38 model log - - - - -
SE-Mnas1_0 69.69 89.12 0.33 4.42M model log 71.76 90.40 0.33 4.48M model log
GEθ-Mnas1_0 72.72 90.87 0.33 4.38M model log 72.82 91.18 0.33 4.48M model log
CBAM-Mnas1_0 69.13 88.92 0.33 4.42M model log 71.00 89.78 0.33 4.56M model log
MobileNetV2 71.03 90.07 0.32 3.50M model log - - - - -
SE-MobileNetV2 72.05 90.58 0.32 3.56M model log 73.24 91.14 0.32 3.65M model log
SK-MobileNetV2 74.05 91.85 0.35 5.28M model log 74.45 91.85 0.36 5.91M model log
GEθ-MobileNetV2 72.28 90.91 0.32 3.50M model log 72.47 90.68 0.32 3.59M model log
CBAM-MobileNetV2 71.91 90.51 0.32 3.57M model log 73.04 91.18 0.34 3.65M model log


Table 2: Detection performances (%) with different backbones on the MS-COCO validation dataset. We employ two state-of-the-art detectors: RetinaNet and Cascade R-CNN in our detection experiments.
Detector Backbone AP(50:95) AP(50) AP(75) AP(s) AP(m) AP(l) Download
Retina ResNet50 36.2 55.9 38.5 19.4 39.8 48.3 model log
Retina SE-ResNet50 37.4 57.8 39.8 20.6 40.8 50.3 model log
Retina DCA-SE-ResNet50 37.7 58.2 40.1 20.8 40.9 50.4 model log
Cascade R-CNN ResNet50 40.6 58.9 44.2 22.4 43.7 54.7 model log
Cascade R-CNN GC-ResNet50 41.1 59.7 44.6 23.6 44.1 54.3 model log
Cascade R-CNN DCA-GC-ResNet50 41.4 60.2 44.7 22.8 45.0 54.2 model log