ilovecv

ilovecv's Stars

OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.8k 106 593894
advimman/lama
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Language:Jupyter Notebook8.1k 85 257865
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python6.1k 52 630478
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Language:Python3.3k 59 102331
LLaVA-VL/LLaVA-NeXT
Language:Python3k 36 315257
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Language:Python1.8k 26 50111
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language:Python1.8k 21 69116
apple/ml-4m
4M: Massively Multimodal Masked Modeling
Language:Python1.6k 33 2596
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Language:Python1.4k 20 6856
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Language:Python1.3k 23 5350
Yujun-Shi/DragDiffusion
[CVPR2024, Highlight] Official code for DragDiffusion
Language:Python1.2k 26 6888
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Language:Python1.1k 42 4757
tencent-ailab/persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Language:Python907 17 863
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Language:Python877 7 6144
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Language:Python711 20 3929
SkalskiP/top-cvpr-2024-papers
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
Language:Python666 14 359
AILab-CVC/SEED
Official implementation of SEED-LLaMA (ICLR 2024).
Language:Python584 15 5032
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
Language:Jupyter Notebook567 13 4724
cientgu/InstructDiffusion
PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
Language:Python396 10 2421
ShihaoZhaoZSH/LaVi-Bridge
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Language:Python314 16 1622
weixi-feng/LayoutGPT
Official repo for LayoutGPT
Language:Python304 13 2020
WisconsinAIVision/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Language:Python299 5 3121
TencentARC/SmartEdit
Official code of SmartEdit [CVPR-2024 Highlight]
Language:Python261 13 438
lucidrains/mmdit
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
Language:Python259 3 35
NJU-PCALab/OpenVid-1M
Language:Python199 3 164
djghosh13/geneval
GenEval: An object-focused framework for evaluating text-to-image alignment
Language:HTML124 1 87
krennic999/STAR
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
123 23 91
Monalissaa/DisenDiff
[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization
Language:Python86 3 62
EternalEvan/FlowIE
This repository contains the official implementation of "FlowIE: Efficient Image Enhancement via Rectified Flow"
Language:Python82 3 162
kyegomez/MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
Language:Python23 3 01