Anti-Oversmoothing: "Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice", ICLR, 2022 (UT Austin). [Paper][PyTorch]
QnA: "Learned Queries for Efficient Local Attention", CVPR, 2022 (Tel-Aviv). [Paper][Jax]
ClusTR: "ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers", arXiv, 2022 (The University of Adelaide, Australia). [Paper]
MobileViTv3: "MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features", arXiv, 2022 (Micron). [Paper][PyTorch]
Tri-Level: "Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training", AAAI, 2023 (Northeastern University). [Paper][Code (in construction)]
ViTCoD: "ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design", IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023 (Georgia Tech). [Paper]
ViTALiTy: "ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention", IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023 (Rice University). [Paper]
HeatViT: "HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers", IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023 (Northeastern University). [Paper]
ToMe: "Token Merging: Your ViT But Faster", ICLR, 2023 (Meta). [Paper][PyTorch]
Conv + Transformer
LeViT: "LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference", ICCV, 2021 (Facebook). [Paper][PyTorch]
ParC-Net: "ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer", ECCV, 2022 (Intellifusion, China). [Paper][PyTorch]
?: "How to Train Vision Transformer on Small-scale Datasets?", BMVC, 2022 (MBZUAI). [Paper][PyTorch]
DHVT: "Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets", NeurIPS, 2022 (USTC). [Paper][Code (in construction)]
iFormer: "Inception Transformer", NeurIPS, 2022 (Sea AI Lab). [Paper][PyTorch]
DenseDCT: "Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets", NeurIPSW, 2022 (University of Kansas). [Paper]
CXV: "Convolutional Xformers for Vision", arXiv, 2022 (IIT Bombay). [Paper][PyTorch]
ConvMixer: "Patches Are All You Need?", arXiv, 2022 (CMU). [Paper][PyTorch]
MobileViTv2: "Separable Self-attention for Mobile Vision Transformers", arXiv, 2022 (Apple). [Paper][PyTorch]
UniFormer: "UniFormer: Unifying Convolution and Self-attention for Visual Recognition", arXiv, 2022 (SenseTime). [Paper][PyTorch]
EdgeFormer: "EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers", arXiv, 2022 (?). [Paper]
MoCoViT: "MoCoViT: Mobile Convolutional Vision Transformer", arXiv, 2022 (ByteDance). [Paper]
DynamicViT: "Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks", arXiv, 2022 (Tsinghua University). [Paper][PyTorch]
ConvFormer: "ConvFormer: Closing the Gap Between CNN and Vision Transformers", arXiv, 2022 (National University of Defense Technology, China). [Paper]
Fast-ParC: "Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs", arXiv, 2022 (Intellifusion, China). [Paper]
MetaFormer: "MetaFormer Baselines for Vision", arXiv, 2022 (Sea AI Lab). [Paper][PyTorch]
InternImage: "InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions", arXiv, 2022 (Shanghai AI Laboratory). [Paper][Code (in construction)]
VAN: "Visual Attention Network", arXiv, 2022 (Tsinghua). [Paper][PyTorch]
SD-MAE: "Masked autoencoders is an effective solution to transformer data-hungry", arXiv, 2022 (Hangzhou Dianzi University). [Paper][PyTorch (in construction)]
SATA: "Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets", WACV, 2023 (University of Kansas). [Paper][PyTorch (in construction)]
SparK: "Sparse and Hierarchical Masked Modeling for Convolutional Representation Learning", ICLR, 2023 (Bytedance). [Paper][PyTorch]
MOAT: "MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models", ICLR, 2023 (Google). [Paper][Tensorflow]