Pinned Repositories
.github
GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
GLEE
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
UniRef
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces
vaex
🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook
VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
FoundationVision's Repositories
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
FoundationVision/GLEE
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
FoundationVision/Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
FoundationVision/UniRef
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces
FoundationVision/OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
FoundationVision/GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
FoundationVision/vaex
🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook
FoundationVision/.github