vision-transformer
There are 758 repositories under vision-transformer topic.
open-mmlab/mmdetection
OpenMMLab Detection Toolbox and Benchmark
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
NielsRogge/Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
cmhungsteve/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
JingyunLiang/SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
FoundationVision/VAR
[GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
google-research/scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
InternLM/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
alibaba/EasyCV
An all-in-one toolkit for computer vision
hila-chefer/Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
microsoft/Cream
This is a collection of our NAS and Vision Transformer work.
mit-han-lab/efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
JingyunLiang/VRT
VRT: A Video Restoration Transformer (official repository)
MCG-NJU/VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
czczup/ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
yitu-opensource/T2T-ViT
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
OpenGVLab/InternVideo
Video Foundation Models & Data for Multimodal Understanding
NVlabs/VoxFormer
Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
pprp/awesome-attention-mechanism-in-cv
Awesome List of Attention Modules and Plug&Play Modules in Computer Vision
OFA-Sys/ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
hustvl/YOLOS
[NeurIPS 2021] You Only Look at One Sequence
emcf/thepipe
Feed PDFs, URLs, Slides, YouTube, and more into Vision-Language models with one line of codeā”
xxxnell/how-do-vits-work
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
sithu31296/semantic-segmentation
SOTA Semantic Segmentation Models in PyTorch
jacobgil/vit-explain
Explainability for Vision Transformers
LeapLabTHU/DAT
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
Alibaba-MIIL/ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
4DVLab/Vision-Centric-BEV-Perception
Vision-Centric BEV Perception: A Survey
uncbiag/Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
Westlake-AI/openmixup
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark