QingtangDing/Awesome-CVPR2024-AIGC

A Collection of Papers and Codes for CVPR2024 AIGC

Awesome-CVPR2024-AIGC

A Collection of Papers and Codes for CVPR2024 AIGC

整理汇总下今年CVPR AIGC相关的论文和代码，具体如下。

欢迎star，fork和PR~

Please feel free to star, fork or PR if helpful~

参考或转载请注明出处

CVPR2024官网：https://cvpr.thecvf.com/Conferences/2024

CVPR完整论文列表：

开会时间：2024年6月17日-6月21日

论文接收公布时间：

【Contents】

1.图像生成(Image Generation/Image Synthesis)
2.图像编辑（Image Editing)
3.视频生成(Video Generation/Image Synthesis)
4.视频编辑(Video Editing)
5.3D生成(3D Generation/3D Synthesis)
6.3D编辑(3D Editing)
7.其他多任务(Others)

1.图像生成(Image Generation/Image Synthesis)

CapHuman: Capture Your Moments in Parallel Universes

Paper: https://arxiv.org/abs/2402.00627
Code: https://github.com/VamosC/CapHuman

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Paper: https://arxiv.org/abs/2312.04655
Code: https://github.com/eclipse-t2i/eclipse-inference

Efficient Dataset Distillation via Minimax Diffusion

Paper: https://arxiv.org/abs/2311.15529
Code: https://github.com/vimar-gu/MinimaxDiffusion

InstanceDiffusion: Instance-level Control for Image Generation

Paper: https://arxiv.org/abs/2402.03290
Code: https://github.com/frank-xwang/InstanceDiffusion

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper: https://arxiv.org/abs/2401.01952

MACE: Mass Concept Erasure in Diffusion Models

Paper: https://arxiv.org/abs/2402.05408
Code: https://github.com/Shilin-LU/MACE

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Paper:
Code: https://github.com/limuloo/MIGC

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper: https://arxiv.org/abs/2312.04461
Code: https://github.com/TencentARC/PhotoMaker

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

2.图像编辑(Image Editing)

Edit One for All: Interactive Batch Image Editing

Paper: https://arxiv.org/abs/2401.10219
Code: https://github.com/thaoshibe/edit-one-for-all

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

Paper: https://arxiv.org/abs/2303.17546
Code: https://github.com/Picsart-AI-Research/PAIR-Diffusion

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper: https://arxiv.org/abs/2312.13964
Code: https://github.com/open-mmlab/PIA

3.视频生成(Video Generation/Image Synthesis)

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Paper: https://arxiv.org/abs/2312.15770
Code: https://tf-t2v.github.io/

DisCo: Disentangled Control for Realistic Human Dance Generation

Paper: https://arxiv.org/abs/2307.00040
Code: https://github.com/Wangt-CN/DisCo

Make Your Dream A Vlog

Paper: https://arxiv.org/abs/2401.09414
Code: https://github.com/Vchitect/Vlogger

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Paper: https://arxiv.org/abs/2311.16813
Code: https://github.com/wenyuqing/panacea

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/yzxing87/Seeing-and-Hearing

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

Paper: https://arxiv.org/abs/2311.17590
Code: https://github.com/ZiqiaoPeng/SyncTalk

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Paper: https://arxiv.org/abs/2401.09047
Code: https://github.com/AILab-CVC/VideoCrafter

4.视频编辑(Video Editing)

5.3D生成(3D Generation/3D Synthesis)

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Paper: https://arxiv.org/abs/2309.00610
Code: https://github.com/kxhit/EscherNet

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Paper: https://arxiv.org/abs/2304.00916
Code: https://github.com/yukangcao/DreamAvatar

EscherNet: A Generative Model for Scalable View Synthesis

Paper: https://arxiv.org/abs/2402.03908
Code: https://github.com/hzxie/city-dreamer

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Paper: https://arxiv.org/abs/2310.08529
Code: https://github.com/hustvl/GaussianDreamer

MoMask: Generative Masked Modeling of 3D Human Motions

Paper: https://arxiv.org/abs/2312.00063
Code: https://github.com/EricGuo5513/momask-codes

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D.

Paper: https://arxiv.org/abs/2311.16918
Code: https://github.com/modelscope/richdreamer

6.3D编辑(3D Editing)

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Paper: https://arxiv.org/abs/2311.14521
Code: https://github.com/buaacyw/GaussianEditor

7.其他多任务(Others)

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Paper: https://arxiv.org/abs/2310.11440
Code: https://github.com/evalcrafter/EvalCrafter

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper: https://arxiv.org/abs/2312.14238
Code: https://github.com/OpenGVLab/InternVL

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper: https://arxiv.org/abs/2311.06783
Code: https://github.com/Q-Future/Q-Instruct

SEED-Bench: Benchmarking Multimodal Large Language Models

Paper: https://arxiv.org/abs/2311.17092
Code: https://github.com/AILab-CVC/SEED-Bench

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Paper: https://arxiv.org/abs/2312.00784
Code: https://github.com/mu-cai/ViP-LLaVA

持续更新~

参考

CVPR 2024 论文和开源项目合集(Papers with Code)

相关整理