ilovecv's Stars
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Kwai-Kolors/Kolors
Kolors Team
Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
dvlab-research/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
XLabs-AI/x-flux
TencentARC/BrushNet
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
OpenGVLab/VisionLLM
VisionLLM Series
NVlabs/MambaVision
Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
TencentARC/SEED-Story
SEED-Story: Multimodal Long Story Generation with Large Language Model
GAIR-NLP/anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
buoyancy99/diffusion-forcing
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
limuloo/MIGC
[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)
frank-xwang/InstanceDiffusion
[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
Zj-BinXia/DiffIR
This project is the official implementation of 'Diffir: Efficient diffusion model for image restoration', ICCV2023
dvlab-research/LLMGA
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
mira-space/MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
tryonlabs/tryondiffusion
TryOnDiffusion: A Tale of Two UNets Implementation
showlab/BoxDiff
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
feizc/DiT-MoE
Scaling Diffusion Transformers with Mixture of Experts
OpenGVLab/MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
xichenpan/ARLDM
Official Pytorch Implementation of Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
HJYao00/DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
AFeng-x/PixWizard
microsoft/ReCo
ReCo: Region-Controlled Text-to-Image Generation, CVPR 2023
Yangyi-Chen/SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
Kwai-Kolors/MPS
fusiming3/MARS
Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
zjlww/ardit-web