Awesome-CVPR2024-AIGC
A Collection of Papers and Codes for CVPR2024 AIGC
整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。
欢迎star,fork和PR~
Please feel free to star, fork or PR if helpful~
参考或转载请注明出处
CVPR2024官网:https://cvpr.thecvf.com/Conferences/2024
CVPR完整论文列表:
开会时间:2024年6月17日-6月21日
论文接收公布时间:
【Contents】
1.图像生成(Image Generation/Image Synthesis)
CapHuman: Capture Your Moments in Parallel Universes
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Efficient Dataset Distillation via Minimax Diffusion
InstanceDiffusion: Instance-level Control for Image Generation
Instruct-Imagen: Image Generation with Multi-modal Instruction
MACE: Mass Concept Erasure in Diffusion Models
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Residual Denoising Diffusion Models
Edit One for All: Interactive Batch Image Editing
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
3.视频生成(Video Generation/Image Synthesis)
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
DisCo: Disentangled Control for Realistic Human Dance Generation
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
5.3D生成(3D Generation/3D Synthesis)
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
EscherNet: A Generative Model for Scalable View Synthesis
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
MoMask: Generative Masked Modeling of 3D Human Motions
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D.
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
SEED-Bench: Benchmarking Multimodal Large Language Models
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
持续更新~
CVPR 2024 论文和开源项目合集(Papers with Code)