vision-transformer

There are 758 repositories under vision-transformer topic.

open-mmlab/mmdetection
OpenMMLab Detection Toolbox and Benchmark
Language:Python28.3k 370 8.1k9.2k
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Language:Python11.2k 69 257929
NielsRogge/Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Language:Jupyter Notebook8.3k 127 4171.3k
cmhungsteve/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
4.4k 122 24477
JingyunLiang/SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
Language:Python4.2k 55 140516
huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Language:Python3.9k 51 259690
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Python3.7k 110 67281
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
Language:Python3.2k 31 7571k
google-research/scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
Language:Python3.1k 39 237412
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Language:Python3k 29 654241
baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
Language:Python2k 31 150146
InternLM/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Language:Python1.9k 35 280121
alibaba/EasyCV
An all-in-one toolkit for computer vision
Language:Python1.7k 31 75191
hila-chefer/Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
Language:Jupyter Notebook1.7k 21 61227
microsoft/Cream
This is a collection of our NAS and Vision Transformer work.
Language:Python1.6k 35 153219
mit-han-lab/efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
Language:Python1.5k 32 111137
JingyunLiang/VRT
VRT: A Video Restoration Transformer (official repository)
Language:Python1.3k 17 62122
MCG-NJU/VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Language:Python1.2k 16 118124
ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Language:Python1.2k 22 131172
czczup/ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
Language:Python1.2k 16 172129
yitu-opensource/T2T-ViT
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Language:Jupyter Notebook1.1k 18 76173
OpenGVLab/InternVideo
Video Foundation Models & Data for Multimodal Understanding
Language:Python1k 29 12270
NVlabs/VoxFormer
Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
Language:Python981 33 5082
pprp/awesome-attention-mechanism-in-cv
Awesome List of Attention Modules and Plug&Play Modules in Computer Vision
Language:Python980 15 4160
OFA-Sys/ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Language:Python862 12 5052
hustvl/YOLOS
[NeurIPS 2021] You Only Look at One Sequence
Language:Jupyter Notebook810 21 29119
emcf/thepipe
Feed PDFs, URLs, Slides, YouTube, and more into Vision-Language models with one line of code⚡
Language:Python796 8 1661
xxxnell/how-do-vits-work
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
Language:Python796 7 4177
sithu31296/semantic-segmentation
SOTA Semantic Segmentation Models in PyTorch
Language:Python773 15 58141
jacobgil/vit-explain
Explainability for Vision Transformers
Language:Python763 7 2586
LeapLabTHU/DAT
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
Language:Python718 13 3469
NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
Language:Python713 17 4755
Alibaba-MIIL/ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
Language:Python706 11 7171
4DVLab/Vision-Centric-BEV-Perception
Vision-Centric BEV Perception: A Survey
644 30 171
uncbiag/Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
630 33 028
Westlake-AI/openmixup
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
Language:Python583 16 5359

vision-transformer

open-mmlab/mmdetection

lukas-blecher/LaTeX-OCR

NielsRogge/Transformers-Tutorials

cmhungsteve/Awesome-Transformer-Attention

JingyunLiang/SwinIR

huawei-noah/Efficient-AI-Backbones

FoundationVision/VAR

open-mmlab/mmpretrain

google-research/scenic

towhee-io/towhee

baaivision/EVA

InternLM/InternLM-XComposer

alibaba/EasyCV

hila-chefer/Transformer-Explainability

microsoft/Cream

mit-han-lab/efficientvit

JingyunLiang/VRT

MCG-NJU/VideoMAE

ViTAE-Transformer/ViTPose

czczup/ViT-Adapter

yitu-opensource/T2T-ViT

OpenGVLab/InternVideo

NVlabs/VoxFormer

pprp/awesome-attention-mechanism-in-cv

OFA-Sys/ONE-PEACE

hustvl/YOLOS

emcf/thepipe

xxxnell/how-do-vits-work

sithu31296/semantic-segmentation

jacobgil/vit-explain

LeapLabTHU/DAT

NVlabs/FasterViT

Alibaba-MIIL/ImageNet21K

4DVLab/Vision-Centric-BEV-Perception

uncbiag/Awesome-Foundation-Models

Westlake-AI/openmixup