/awesome-attention-mechanism-in-cv

:punch: CV中常用注意力模块;即插即用模块;ViT模型. PyTorch Implementation Collection of Attention Module and Plug&Play Module

Primary LanguagePythonMIT LicenseMIT

Awesome-Attention-Mechanism-in-cv Awesome

Table of Contents

Introduction

PyTorch implements a variety of Attention mechanisms used in network design in computer vision, as well as a collection of plug and play modules. Due to limited ability and energy, many modules may not be included.

If you have any suggestions or improvements, welcome to submit an issue or PR.

Attention Mechanism

Paper Publish Link Main Idea Blog
Global Second-order Pooling Convolutional Networks CVPR19 GSoPNet 将高阶和注意力机制在网络中部地方结合起来
Neural Architecture Search for Lightweight Non-Local Networks CVPR20 AutoNL NAS+LightNL
Squeeze and Excitation Network CVPR18 SENet 最经典的通道注意力 zhihu
Selective Kernel Network CVPR19 SKNet SE+动态选择 zhihu
Convolutional Block Attention Module ECCV18 CBAM 串联空间+通道注意力 zhihu
BottleNeck Attention Module BMVC18 BAM 并联空间+通道注意力 zhihu
Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks MICCAI18 scSE 并联空间+通道注意力 zhihu
Non-local Neural Networks CVPR19 Non-Local(NL) self-attention zhihu
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond ICCVW19 GCNet 对NL进行改进 zhihu
CCNet: Criss-Cross Attention for Semantic Segmentation ICCV19 CCNet 对NL改进
SA-Net:shuffle attention for deep convolutional neural networks ICASSP 21 SANet SGE+channel shuffle zhihu
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks CVPR20 ECANet SE的改进
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks CoRR19 SGENet Group+spatial+channel
FcaNet: Frequency Channel Attention Networks ICCV21 FcaNet 频域上的SE操作
$A^2\text{-}Nets$: Double Attention Networks NeurIPS18 DANet NL的**应用到空间和通道
Asymmetric Non-local Neural Networks for Semantic Segmentation ICCV19 APNB spp+NL
Efficient Attention: Attention with Linear Complexities CoRR18 EfficientAttention NL降低计算量
Image Restoration via Residual Non-local Attention Networks ICLR19 RNAN
Exploring Self-attention for Image Recognition CVPR20 SAN 理论性很强,实现起来很简单
An Empirical Study of Spatial Attention Mechanisms in Deep Networks ICCV19 None MSRA综述self-attention
Object-Contextual Representations for Semantic Segmentation ECCV20 OCRNet 复杂的交互机制,效果确实好
IAUnet: Global Context-Aware Feature Learning for Person Re-Identification TTNNLS20 IAUNet 引入时序信息
ResNeSt: Split-Attention Networks CoRR20 ResNeSt SK+ResNeXt
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks NeurIPS18 GENet SE续作
Improving Convolutional Networks with Self-calibrated Convolutions CVPR20 SCNet 自校正卷积
Rotate to Attend: Convolutional Triplet Attention Module WACV21 TripletAttention CHW两两互相融合
Dual Attention Network for Scene Segmentation CVPR19 DANet self-attention
Relation-Aware Global Attention for Person Re-identification CVPR20 RGANet 用于reid
Attentional Feature Fusion WACV21 AFF 特征融合的attention方法
An Attentive Survey of Attention Models CoRR19 None 包括NLP/CV/推荐系统等方面的注意力机制
Stand-Alone Self-Attention in Vision Models NeurIPS19 FullAttention 全部的卷积都替换为self-attention
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation ECCV18 BiSeNet 类似FPN的特征融合方法 zhihu
DCANet: Learning Connected Attentions for Convolutional Neural Networks CoRR20 DCANet 增强attention之间信息流动
An Empirical Study of Spatial Attention Mechanisms in Deep Networks ICCV19 None 对空间注意力进行针对性分析
Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition CVPR17 Oral RA-CNN 细粒度识别
Guided Attention Network for Object Detection and Counting on Drones ACM MM20 GANet 处理目标检测问题
Attention Augmented Convolutional Networks ICCV19 AANet 多头+引入额外特征映射
GLOBAL SELF-ATTENTION NETWORKS FOR IMAGE RECOGNITION ICLR21 GSA 新的全局注意力模块
Attention-Guided Hierarchical Structure Aggregation for Image Matting CVPR20 HAttMatting 抠图方面的应用,高层使用通道注意力机制,然后再使用空间注意力机制指导低层。
Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks ECCV20 None 与SE互补的权值激活机制
Expectation-Maximization Attention Networks for Semantic Segmentation ICCV19 Oral EMANet EM+Attention
Dense-and-implicit attention network AAAI 20 DIANet LSTM+block间特征共享+通道注意力
Coordinate Attention for Efficient Mobile Network Design CVPR21 CoordAttention 横向、竖向
Cross-channel Communication Networks NIPS19 C3Net GNN+SE
Gated Convolutional Networks with Hybrid Connectivity for Image Classification AAAI20 HCGNet 引入了LSTM的部分概念
Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network AAAI19 None Dropout+SE
BA^2M: A Batch Aware Attention Module for Image Classification CVPR21 None Batch之间建立attention
EPSANet:An Efficient Pyramid Split Attention Block on Convolutional Neural Network CoRR21 EPSANet 多尺度
Stand-Alone Self-Attention in Vision Models NIPS19 SASA Non-Local变体
ResT: An Efficient Transformer for Visual Recognition CoRR21 ResT self-attention变体
Spanet: Spatial Pyramid Attention Network for Enhanced Image Recognition ICME20 SPANet 多个AAP组成金字塔
Space-time Mixing Attention for Video Transformer CoRR21 X-VIT Not release VIT+时空attention
DMSANet: Dual Multi Scale Attention Network CoRR21 Not release yet 两尺度+轻量
CompConv: A Compact Convolution Module for Efficient Feature Learning CoRR21 Not release yet res2net+ghostnet
VOLO: Vision Outlooker for Visual Recognition CoRR21 VOLO ViT上的Attention
Interflow: Aggregating Multi-layer Featrue Mappings with Attention Mechanism CoRR21 Not release yet 辅助头级别attention
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning CoRR21 MUSE Attention NLP中对SA进行改进
Polarized Self-Attention: Towards High-quality Pixel-wise Regression CoRR21 PSA Pixel-wise regression
CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation TMI21 CA-Net Spatial Attention
BAM: A Lightweight and Efficient Balanced Attention Mechanism for Single Image Super Resolution CoRR21 BAM Super resolution
Attention as Activation CoRR21 ATAC activation + attention
Region-based Non-local Operation for Video Classification CoRR21 RNL video classification
MSAF: Multimodal Split Attention Fusion CoRR21 MSAF MultiModal
All-Attention Layer CoRR19 None Tranformer Layer
Compact Global Descriptor CoRR20 CGD add every two channel attention
SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks ICML21 SimAM 类脑计算神经元能量
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution ICCV19 OctConv 从频率角度改进
Contextual Transformer Networks for Visual Recognition ICCV21 CoTNet 虽然宣称Transformer改进,但实际上就是non-local非常接近
Residual Attention: A Simple but Effective Method for Multi-Label Recognition ICCV21 CSRA 用于多标签图像识别任务
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation CVPR20 SEAM 弱监督
An Attention Module for Convolutional Neural Networks ICCV2021 AW-Conv 提升了SE部分的容量
Attentive Normalization Arxiv2020 None BN+Attention
Person Re-identification via Attention Pyramid TIP21 APNet 注意力金字塔+ReID
Unifying Nonlocal Blocks for Neural Networks ICCV21 SNL Non-Local + 引入图谱概念
Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context ICCVW21 None Spatial+Channel
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network ICCVW21 PP-NAS 搜索即插即用模块
Distilling Knowledge via Knowledge Review CVPR21 ReviewKD 知识蒸馏+Spatial Attention
Dynamic Region-Aware Convolution CVPR21 None 动态生成卷积核
Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation CVPR21 None STN-GRU
Introvert: Human Trajectory Prediction via Conditional 3D Attention CVPR21 None 3D Attention
SSAN: Separable Self-Attention Network for Video Representation Learning CVPR21 None SSAN for video
Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation CVPR21 DANet Few-Shot Video Segmentation
A2 -FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation CVPR21 None FPN+Attention
Image Super-Resolution with Non-Local Sparse Attention CVPR21 None SR+Non local
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection CVPR21 LaneATT Land Detection
NAM: Normalization-based Attention Module CoRR21 NAM Normal+Attention
NAS-SCAM: Neural Architecture Search-Based Spatial and Channel Joint Attention Module for Nuclei Semantic Segmentation and Classification MICCAI20 NAS-SCAM Attention Search
NASABN: A Neural Architecture Search Framework for Attention-Based Networks IJCNN20 None NLP+NAS
Att-DARTS: Differentiable Neural Architecture Search for Attention IJCNN20 Att-Darts Darts+AttentionSearch
On the Integration of Self-Attention and Convolution CoRR21 ACMix self attention+conv
BoxeR: Box-Attention for 2D and 3D Transformers CoRR21 None 目标检测+attention
CoAtNet: Marrying Convolution and Attention for All Data Sizes NIPS21 coatnet VIT
Pay Attention to MLPs NIPS21 gmlp MLP
IC-Conv: Inception Convolution With Efficient Dilation Search CVPR21 Oral IC-Conv 空洞率搜索
SRM : A Style-based Recalibration Module for Convolutional Neural Networks ICCV19 SRM Style校准注意力
SPANet: Spatial Pyramid Attention Network for Enhanced Image Recognition ICME20 SPANet SE+SP
Competitive Inner-Imaging Squeeze and Excitation for Residual Network CoRR18 Competitive-SENet 引入skip connection信息
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks WACV20 ULSAM 空间注意力
Augmenting Convolutional networks with attention-based aggregation CoRR21 None 在ViT范式基础上增加线性注意力
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification AAAI21 CAP 结合achor,LSTM,SE等构建注意力实现细粒度识别
Instance Enhancement Batch Normalization: An Adaptive Regulator of Batch Noise AAAI20 IEBN 从噪声调节角度解释SENet work的原因,并提出针对性改进

Dynamic Networks

Title Publish Github Main Idea
Dynamic Neural Networks: A Survey CoRR21 None 综述
CondConv: Conditionally Parameterized Convolutions for Efficient Inference NIPS19 CondConv 卷积核参数通过对输入进行变换得到
DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks CoRR20 None 学习一组核系数并用于融合多个固定核为一个动态核
Dynamic Convolution: Attention over Convolution Kernels CVPR20 Dynamic-convolution-Pytorch 多卷积核融合提升模型表达
WeightNet: Revisiting the Design Space of Weight Network ECCV20 weightNet SENet融合CondConv
Dynamic Filter Networks
Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution
SkipNet: Learning Dynamic Routing in Convolutional Networks
Pay Less Attention with Lightweight and Dynamic Convolutions
Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations
Dynamic Group Convolution for Accelerating Convolutional Neural Networks ECCV20 dgc 组局部性

Plug and Play Module

Title Publish Github Main Idea
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks ICCV19 ACNet 重参数化
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs TPAMI18 ASPP 空洞卷积
MixConv: Mixed Depthwise Convolutional Kernels BMCV19 MixedConv 不同kernel的卷积
Pyramid Scene Parsing Network CVPR17 PSP 金字塔池化
Receptive Field Block Net for Accurate and Fast Object Detection ECCV18 RFB 空洞卷积
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing CVPR20 SPNet 两个方向池化
SSH: Single Stage Headless Face Detector ICCV17 SSH 最简单的感受野模块
GhostNet: More Features from Cheap Operations CVPR20 GhostNet 简单而有效
SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping TIP21 SlimConv Flip操作+SE
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ICML19 EfficientNet 出色的网络构建模块
CondConv: Conditionally Parameterized Convolutions for Efficient Inference NIPS19 CondConv 动态卷积
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network ICCVW21 PPNAS 组间链接搜索
Dynamic Convolution: Attention over Convolution Kernels CVPR20 DynamicConv 动态滤波器
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer ECCV20 PSConv 细粒度多尺度
DCANet: Dense Context-Aware Network for Semantic Segmentation ECCV20 DCANet 注意力
Enhancing feature fusion for human pose estimation MVA20 SEB 特征融合
Object Contextual Representation for sematic segmentation ECCV2020 HRNet-OCR OCRModule
DO-Conv: Depthwise Over-parameterized Convolutional Layer CoRR20 DO-Conv over-parameterized Conv
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition CoRR20 PyConv 不同kernel的卷积
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks WACV20 ULSAM 空间注意力
Dynamic Group Convolution for Accelerating Convolutional Neural Networks ECCV20 DGC 动态分组卷积

Vision Transformer

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021, ViT

[paper] [Github]

Title Publish Github Main Idea
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows ICCV21 SwinT
CPVT: Conditional Positional Encodings for Vision Transformer CoRR21 CPVT
GLiT: Neural Architecture Search for Global and Local Image Transformer CoRR21 GLiT NAS
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases CoRR21 ConViT GPSA
CeiT: Incorporating Convolution Designs into Visual Transformers CoRR21 CeiT LCA,LeFF
BoTNet: Bottleneck Transformers for Visual Recognition CVPR21 BoTNet NonBlock-like
CvT: Introducing Convolutions to Vision Transformers ICCV21 CvT projection
TransCNN: Transformer in Convolutional Neural Networks CoRR21 TransCNN
ResT: An Efficient Transformer for Visual Recognition CoRR21 ResT
CoaT: Co-Scale Conv-Attentional Image Transformers CoRR21 CoaT
ConTNet: Why not use convolution and transformer at the same time? CoRR21 ConTNet
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification NIPS21 DynamicViT
DVT: Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition NIPS21 DVT
CoAtNet: Marrying Convolution and Attention for All Data Sizes CoRR21 CoAtNet
Early Convolutions Help Transformers See Better CoRR21 None
Compact Transformers: Escaping the Big Data Paradigm with Compact Transformers CoRR21 CCT
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer CoRR21 MobileViT
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference CoRR21 LeViT
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer CoRR21 ShuffleTransformer
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias CoRR21 ViTAE
LocalViT: Bringing Locality to Vision Transformers CoRR21 LocalViT
DeiT: Training data-efficient image transformers & distillation through attention ICML21 DeiT
CaiT: Going deeper with Image Transformers ICCV21 CaiT
Efficient Training of Visual Transformers with Small-Size Datasets NIPS21 None
Vision Transformer with Deformable Attention CoRR22 DAT DeformConv+SA
MaxViT: Multi-Axis Vision Transformer CoRR22 None dilated attention

Contribute

欢迎在issue中提出补充的文章paper和对应code链接。

感谢@dedekinds 指出的DIANet描述中存在的问题。

https://programmathically.com/understanding-padding-and-stride-in-convolutional-neural-networks/