Variants of Vision Transformer and Vision Transformer for Downstream Tasks
author: Runwei Guan
affiliation: University of Liverpool / Xi'an Jiaotong-Liverpool University
email: thinkerai@foxmail.com
- Vision Transformer paper code
- Swin Transformer paper code
- ViViT: A Video Vision Transformer paper
- DVT paper code
- PVT paper code
- PiT paper code
- Twins paper code
- TNT paper code
- Mobile-ViT paper code
- Cross-ViT paper code
- LeViT paper code
- ViT-Lite paper
- Refiner paper code
- DeepViT paper code
- CaiT paper code
- LV-ViT paper code
- DeiT paper code
- CeiT paper code
- BoTNet paper
- ViTAE paper
- Visformer: The Vision-Friendly Transformer paper code
- Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training paper
- AdaViT: Adaptive Tokens for Efficient Vision Transformer paper
- Improved Multiscale Vision Transformers for Classification and Detection paper
- Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding paper
- Point Cloud Transformer paper
- Point Transformer paper
- Fast Point Transformer paper
- Adaptive Channel Encoding Transformer for Point Cloud Analysis paper
- A Unified Pruning Framework for Vision Transformers paper
- Pre-Trained Image Processing Transformer paper code
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers paper code
- BEVT: BERT Pretraining of Video Transformers paper
- Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text paper
- Multi-Modal Fusion Transformer for End-to-End Autonomous Driving paper
- Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval paper
- LAVT: Language-Aware Vision Transformer for Referring Image Segmentation paper
- MTFNet: Mutual-Transformer Fusion Network for RGB-D Salient Object Detection paper
- Visual-Semantic Transformer for Scene Text Recognition paper
- Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text paper
- YOLOS: You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection paper code
- WB-DETR: Transformer-Based Detector without Backbone paper
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers paper
- TSP: Rethinking Transformer-based Set Prediction for Object Detection paper
- DETR paper code
- Rethinking Transformer-Based Set Prediction for Object Detection paper
- End-to-End Object Detection with Adaptive Clustering Transformer paper
- An End-to-End Transformer Model for 3D Object Detection paper
- End-to-End Human Object Interaction Detection with HOI Transformer paper code
- Adaptive Image Transformer for One-Shot Object Detection paper
- Improving 3D Object Detection With Channel-Wise Transformer paper
- TransPose: Keypoint Localization via Transformer paper
- Voxel Transformer for 3D Object Detection paper
- Embracing Single Stride 3D Object Detector with Sparse Transformer paper
- OW-DETR: Open-world Detection Transformer paper
- MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers paper code
- Line Segment Detection Using Transformers without Edges paper
- VisTR: End-to-End Video Instance Segmentation with Transformers paper code
- SETR: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers paper code
- Segmenter: Transformer for Semantic Segmentation paper
- Fully Transformer Networks for Semantic ImageSegmentation paper
- SOTR: Segmenting Objects with Transformers paper code
- GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation paper
- Masked-attention Mask Transformer for Universal Image Segmentation paper
- Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation paper
- HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation paper
- End-to-End Human Pose and Mesh Reconstruction with Transformers paper code
- PE-former: Pose Estimation Transformer paper
- Pose Recognition with Cascade Transformers paper code
- Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer code
- Geometry-Contrastive Transformer for Generalized 3D Pose Transfer paper
- Temporal Transformer Networks with Self-Supervision for Action Recognition paper
- Co-training Transformer with Videos and Images Improves Action Recognition paper
- Transformer Tracking paper code
- Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking paper code
- MOTR: End-to-End Multiple-Object Tracking with TRansformer paper code
- SwinTrack: A Simple and Strong Baseline for Transformer Tracking paper
- Pedestrian Trajectory Prediction via Spatial Interaction Transformer Network paper
- PTTR: Relational 3D Point Cloud Object Tracking with Transformer paper
- 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds paper
- Spatial-Temporal Transformer for Dynamic Scene Graph Generation paper
- THUNDR: Transformer-Based 3D Human Reconstruction With Markers paper
- DoodleFormer: Creative Sketch Drawing with Transformers paper
- Image Transformer paper
- Taming Transformers for High-Resolution Image Synthesis paper code
- TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up code
- U2-Former: A Nested U-shaped Transformer for Image Restoration paper
- Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning paper code
- iGPT paper code
- An Empirical Study of Training Self-Supervised Vision Transformers paper code
- Self-supervised Video Transformer paper
- TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning paper
- Development and testing of an image transformer for explainable autonomous driving systems paper
- Transformer Interpretability Beyond Attention Visualization paper code
- Semi-Supervised Medical Image Segmentation via Cross Teaching between CNN and Transformer paper
- 3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis paper
- Hformer: Pre-training and Fine-tuning Transformers for fMRI Prediction Tasks paper
- MT-TransUNet: Mediating Multi-Task Tokens in Transformers for Skin Lesion Segmentation and Classification paper