/Awesome-Vision-Attentions

Summary of related papers on visual attention. Related code will be released based on Jittor gradually.

Primary LanguagePython

This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper

介绍该论文的中文版博客 链接

Citation

If it is helpful for your work, please cite this paper:

@misc{guo2021attention_survey,
      title={Attention Mechanisms in Computer Vision: A Survey}, 
      author={Meng-Hao Guo and Tian-Xing Xu and Jiang-Jiang Liu and Zheng-Ning Liu and Peng-Tao Jiang and Tai-Jiang Mu and Song-Hai Zhang and Ralph R. Martin and Ming-Ming Cheng and Shi-Min Hu},
      year={2021},
      eprint={2111.07624},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

image

  • Codes about different attention mechanisms based on Jittor are released now
  • TODO : collect more related papers. Contributions are welcome.

🔥 (citations > 200)

Channel attention

  • Squeeze-and-Excitation Networks (CVPR 2018) pdf, (PAMI2019 version) pdf 🔥
  • Image superresolution using very deep residual channel attention networks (ECCV 2018) pdf 🔥
  • Context encoding for semantic segmentation (CVPR 2018) pdf 🔥
  • Spatio-temporal channel correlation networks for action classification (ECCV 2018) pdf
  • Global second-order pooling convolutional networks (CVPR 2019) pdf
  • Srm : A style-based recalibration module for convolutional neural networks (ICCV 2019) pdf
  • You look twice: Gaternet for dynamic filter selection in cnns (CVPR 2019) pdf
  • Second-order attention network for single image super-resolution (CVPR 2019) pdf 🔥
  • DIANet: Dense-and-Implicit Attention Network (AAAI 2020)pdf
  • Spsequencenet: Semantic segmentation network on 4d point clouds (CVPR 2020) pdf
  • Ecanet: Efficient channel attention for deep convolutional neural networks (CVPR 2020) pdf 🔥
  • Gated channel transformation for visual recognition (CVPR2020) pdf
  • Fcanet: Frequency channel attention networks (ICCV 2021) pdf

Spatial attention

  • Recurrent models of visual attention (NeurIPS 2014), pdf 🔥
  • Show, attend and tell: Neural image caption generation with visual attention (PMLR 2015) pdf 🔥
  • Draw: A recurrent neural network for image generation (ICML 2015) pdf 🔥
  • Spatial transformer networks (NeurIPS 2015) pdf 🔥
  • Multiple object recognition with visual attention (ICLR 2015) pdf 🔥
  • Action recognition using visual attention (arXiv 2015) pdf 🔥
  • Videolstm convolves, attends and flows for action recognition (arXiv 2016) pdf 🔥
  • Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition (CVPR 2017) pdf 🔥
  • Learning multi-attention convolutional neural network for fine-grained image recognition (ICCV 2017) pdf 🔥
  • Diversified visual attention networks for fine-grained object classification (TMM 2017) pdf 🔥
  • High-Order Attention Models for Visual Question Answering (NeurIPS 2017) pdf
  • Attentional pooling for action recognition (NeurIPS 2017) pdf 🔥
  • Non-local neural networks (CVPR 2018) pdf 🔥
  • Attentional shapecontextnet for point cloud recognition (CVPR 2018) pdf
  • Relation networks for object detection (CVPR 2018) pdf 🔥
  • a2-nets: Double attention networks (NeurIPS 2018) pdf 🔥
  • Attention-aware compositional network for person re-identification (CVPR 2018) pdf 🔥
  • Tell me where to look: Guided attention inference network (CVPR 2018) pdf 🔥
  • Pedestrian alignment network for large-scale person re-identification (TCSVT 2018) pdf 🔥
  • Learn to pay attention (ICLR 2018) pdf 🔥
  • Attention U-Net: Learning Where to Look for the Pancreas (MIDL 2018) pdf 🔥
  • Psanet: Point-wise spatial attention network for scene parsing (ECCV 2018) pdf 🔥
  • Self attention generative adversarial networks (ICML 2019) pdf 🔥
  • Attentional pointnet for 3d-object detection in point clouds (CVPRW 2019) pdf
  • Co-occurrent features in semantic segmentation (CVPR 2019) pdf
  • Factor Graph Attention (CVPR 2019) pdf
  • Attention augmented convolutional networks (ICCV 2019) pdf 🔥
  • Local relation networks for image recognition (ICCV 2019) pdf
  • Latentgnn: Learning efficient nonlocal relations for visual recognition(ICML 2019) pdf
  • Graph-based global reasoning networks (CVPR 2019) pdf 🔥
  • Gcnet: Non-local networks meet squeeze-excitation networks and beyond (ICCVW 2019) pdf 🔥
  • Asymmetric non-local neural networks for semantic segmentation (ICCV 2019) pdf 🔥
  • Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition (CVPR 2019) pdf
  • Second-order non-local attention networks for person re-identification (ICCV 2019) pdf 🔥
  • End-to-end comparative attention networks for person re-identification (ICCV 2019) pdf 🔥
  • Modeling point clouds with self-attention and gumbel subset sampling (CVPR 2019) pdf
  • Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification (arXiv 2019) pdf
  • L2g autoencoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention (arXiv 2019) pdf
  • Generative pretraining from pixels (PMLR 2020) pdf
  • Exploring self-attention for image recognition (CVPR 2020) pdf
  • Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self attention (ACM MM 20) pdf
  • Disentangled non-local neural networks (ECCV 2020) pdf
  • Relation-aware global attention for person re-identification (CVPR 2020) pdf
  • Segmentation transformer: Object-contextual representations for semantic segmentation (ECCV 2020) pdf 🔥
  • Spatial pyramid based graph reasoning for semantic segmentation (CVPR 2020) pdf
  • Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation (CVPR 2020) pdf
  • End-to-end object detection with transformers (ECCV 2020) pdf 🔥
  • Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling (CVPR 2020) pdf
  • Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers (CVPR 2021) pdf
  • An image is worth 16x16 words: Transformers for image recognition at scale (ICLR 2021) pdf 🔥
  • Is Attention Better Than Matrix Decomposition? (ICLR 2021) pdf
  • An empirical study of training selfsupervised vision transformers (CVPR 2021) pdf
  • Ocnet: Object context network for scene parsing (IJCV 2021) pdf 🔥
  • Point transformer (ICCV 2021) pdf
  • PCT: Point Cloud Transformer (CVMJ 2021) pdf
  • Pre-trained image processing transformer (CVPR 2021) pdf
  • An empirical study of training self-supervised vision transformers (ICCV 2021) pdf
  • Segformer: Simple and efficient design for semantic segmentation with transformers (arxiv 2021) pdf
  • Beit: Bert pre-training of image transformers (arxiv 2021) pdf
  • Beyond Self-attention: External attention using two linear layers for visual tasks (arxiv 2021) pdf
  • Query2label: A simple transformer way to multi-label classification (arxiv 2021) pdf
  • Transformer in transformer (arxiv 2021) pdf

Temporal attention

  • Jointly attentive spatial-temporal pooling networks for video-based person re-identification (ICCV 2017) pdf 🔥
  • Video person reidentification with competitive snippet-similarity aggregation and co-attentive snippet embedding (CVPR 2018) pdf
  • Scan: Self-and-collaborative attention network for video person re-identification (TIP 2019) pdf

Branch attention

  • Training very deep networks (NeurIPS 2015) pdf 🔥
  • Selective kernel networks (CVPR 2019) pdf 🔥
  • CondConv: Conditionally Parameterized Convolutions for Efficient Inference (NeurIPS 2019) pdf
  • Dynamic convolution: Attention over convolution kernels (CVPR 2020) pdf
  • ResNest: Split-attention networks (arXiv 2020) pdf 🔥

ChannelSpatial attention

  • Residual attention network for image classification (CVPR 2017) pdf 🔥
  • SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning (CVPR 2017) pdf 🔥
  • CBAM: convolutional block attention module (ECCV 2018) pdf 🔥
  • Harmonious attention network for person re-identification (CVPR 2018) pdf 🔥
  • Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks (TMI 2018) pdf
  • Mancs: A multi-task attentional network with curriculum sampling for person re-identification (ECCV 2018) pdf 🔥
  • Bam: Bottleneck attention module(BMVC 2018) pdf 🔥
  • Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition (ACM MM 2018) pdf
  • Learning what and where to attend (ICLR 2019) pdf
  • Dual attention network for scene segmentation (CVPR 2019) pdf 🔥
  • Abd-net: Attentive but diverse person re-identification (ICCV 2019) pdf
  • Mixed high-order attention network for person re-identification (ICCV 2019) pdf
  • Mlcvnet: Multi-level context votenet for 3d object detection (CVPR 2020) pdf
  • Improving convolutional networks with self-calibrated convolutions (CVPR 2020) pdf
  • Relation-aware global attention for person re-identification (CVPR 2020) pdf
  • Strip Pooling: Rethinking spatial pooling for scene parsing (CVPR 2020) pdf
  • Rotate to attend: Convolutional triplet attention module, (WACV 2021) pdf
  • Coordinate attention for efficient mobile network design (CVPR 2021) pdf
  • Simam: A simple, parameter-free attention module for convolutional neural networks (ICML 2021) pdf

SpatialTemporal attention

  • An end-to-end spatio-temporal attention model for human action recognition from skeleton data (AAAI 2017) pdf 🔥
  • Diversity regularized spatiotemporal attention for video-based person re-identification (arXiv 2018) 🔥
  • Interpretable spatio-temporal attention for video action recognition (ICCVW 2019) pdf
  • A Simple Baseline for Audio-Visual Scene-Aware Dialog (CVPR 2019) pdf
  • Hierarchical lstms with adaptive attention for visual captioning (TPAMI 2020) pdf
  • Stat: Spatial-temporal attention mechanism for video captioning, (TMM 2020) pdf
  • Gta: Global temporal attention for video action understanding (arXiv 2020) pdf
  • Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification (CVPR 2020) pdf
  • Read: Reciprocal attention discriminator for image-to-video re-identification (ECCV 2020) pdf
  • Decoupled spatial-temporal transformer for video inpainting (arXiv 2021) pdf
  • Towards Coherent Visual Storytelling with Ordered Image Attention (arXiv 2021) pdf