ilovecv's Stars
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
advimman/lama
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
LLaVA-VL/LLaVA-NeXT
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
apple/ml-4m
4M: Massively Multimodal Masked Modeling
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Yujun-Shi/DragDiffusion
[CVPR2024, Highlight] Official code for DragDiffusion
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
tencent-ailab/persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
SkalskiP/top-cvpr-2024-papers
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
AILab-CVC/SEED
Official implementation of SEED-LLaMA (ICLR 2024).
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
cientgu/InstructDiffusion
PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
ShihaoZhaoZSH/LaVi-Bridge
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
weixi-feng/LayoutGPT
Official repo for LayoutGPT
WisconsinAIVision/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
TencentARC/SmartEdit
Official code of SmartEdit [CVPR-2024 Highlight]
lucidrains/mmdit
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
NJU-PCALab/OpenVid-1M
djghosh13/geneval
GenEval: An object-focused framework for evaluating text-to-image alignment
krennic999/STAR
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
Monalissaa/DisenDiff
[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization
EternalEvan/FlowIE
This repository contains the official implementation of "FlowIE: Efficient Image Enhancement via Rectified Flow"
kyegomez/MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"