Pinned Repositories
SEED
Official implementation of SEED-LLaMA (ICLR 2024).
Bunny
A family of lightweight multimodal models.
Emu3
Next-Token Prediction is All You Need
1d-tokenizer
This repo contains the code for 1D tokenizer and generator
VAR
[NeurIPS 2024 Oral][GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
sglang
SGLang is a fast serving framework for large language models and vision language models.
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
zhangqingwu's Repositories
zhangqingwu/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence