
A Collection of Papers and Codes for CVPR2024 AIGC


A Collection of Papers and Codes for CVPR2024 AIGC

整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。


Please feel free to star, fork or PR if helpful~







1.图像生成(Image Generation/Image Synthesis)

CapHuman: Capture Your Moments in Parallel Universes

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Efficient Dataset Distillation via Minimax Diffusion

InstanceDiffusion: Instance-level Control for Image Generation

Instruct-Imagen: Image Generation with Multi-modal Instruction

MACE: Mass Concept Erasure in Diffusion Models

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Residual Denoising Diffusion Models

2.图像编辑(Image Editing)

Edit One for All: Interactive Batch Image Editing

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

3.视频生成(Video Generation/Image Synthesis)

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

DisCo: Disentangled Control for Realistic Human Dance Generation

Make Your Dream A Vlog

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

4.视频编辑(Video Editing)

5.3D生成(3D Generation/3D Synthesis)

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

EscherNet: A Generative Model for Scalable View Synthesis

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

MoMask: Generative Masked Modeling of 3D Human Motions

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D.

6.3D编辑(3D Editing)

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting


EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

SEED-Bench: Benchmarking Multimodal Large Language Models

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts



CVPR 2024 论文和开源项目合集(Papers with Code)
