xhl-video's Stars
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
FoundationVision/VAR
[NeurIPS 2024 Best Paper][GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
google-research/big_vision
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
ytongbai/LVM
MCG-NJU/VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
apple/ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Sense-X/UniFormer
[ICLR2022] official implementation of UniFormer
bytedance/ibot
iBOT :robot:: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
facebookresearch/flip
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
UCSC-VLAA/CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Beckschen/3D-TransUNet
This is the official repository for the paper "3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers"
ytongbai/ViTs-vs-CNNs
[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)
UCSC-VLAA/RobustCNN
[ICLR 2023] This repository includes the official implementation our paper "Can CNNs Be More Robust Than Transformers?"
UCSC-VLAA/Recap-DataComp-1B
This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
ggjy/DeLVM
UCSC-VLAA/DMAE
[CVPR 2023] This repository includes the official implementation our paper "Masked Autoencoders Enable Efficient Knowledge Distillers"
OliverRensu/D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Learners"
patil-suraj/vit-vqgan
JAX implementation ViT-VQGAN
OliverRensu/ARM
This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision
nazmul-karim170/UNICON
[CVPR'22] Official Implementation of the CVPR 2022 paper "UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning"
OliverRensu/MVG
UCSC-VLAA/CRATE-alpha
This repository includes the official implementation our paper "Scaling White-Box Transformers for Vision"
UCSC-VLAA/EVP
[TMLR'24] This repository includes the official implementation our paper "Unleashing the Power of Visual Prompting At the Pixel Level"
meijieru/fast_advprop
[ICLR 2022]: Fast AdvProp
yuyinzhou/L2B
This repository includes the official project of L2B, from our paper "Learning to Bootstrap for Combating Label Noise".
UCSC-VLAA/CLIPS
An Enhanced CLIP Framework for Learning with Synthetic Captions
UCSC-VLAA/FedConv
[TMLR'24] This repository includes the official implementation our paper "FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning"
UCSC-VLAA/AdvXL
[CVPR 2024] This repository includes the official implementation our paper "Revisiting Adversarial Training at Scale"
UCSC-VLAA/Image-Pretraining-for-Video
[ECCV 2022] This repository includes the official implementation our paper "In Defense of Image Pre-Training for Spatiotemporal Recognition".