Awesome Papers on Video/3D Generation and Representation
Video Generation
2023
PVDM: Video Probabilistic Diffusion Models in Projected Latent Space
[code] (CVPR 2023)
MAGVIT: Masked Generative Video Transformer
[paper][page][code(coming soon)]
MagicVideo: Efficient Video Generation With Latent Diffusion Models
Phenaki: Variable Length Video Generation From Open Domain Textual Description
[paper] (ICLR 2023)
Make-A-Video: Text-to-Video Generation without Text-Video Data
(ICLR 2023)
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
[paper][page][code] (ICLR 2023)
2022
Generating Long Videos of Dynamic Scenes
[][paper][page][page][code] (NeurIPS 2022)
Video Diffusion Models
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
[paper][page][code] (NeurIPS 2022)
TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
DIGAN: Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
[paper][code] (ICLR 2022)
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
[paper][page][code] (ECCV 2022)
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
[paper][page][code] (CVPR 2022)
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
[paper][page][code] (CVPR 2022)
Video2StyleGAN: Disentangling Local and Global Variations in a Video
2021
CCVS: Context-aware Controllable Video Synthesis
[paper] NeurIPS
V3GAN: Decomposing Background, Foreground and Motion for Video Generation
Playable Video Generation
[paper][code] (CVPR 2021 Oral)
Stochastic Image-to-Video Synthesis using cINNs
[paper][page][code] (CVPR 2021)
Generative Video Transformer: Can Objects be the Words?
[paper] (ICML 2021)
VideoGPT: Video Generation using VQ-VAE and Transformers
[paper][code]
StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
MoCoGAN-HD: A Good Image Generator Is What You Need for High-Resolution Video Synthesis
[paper][code] (ICLR 2021 Spotlight)
InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation
Temporal Shift GAN for Large Scale Video Generation
[paper] (WACV 2021)
2020
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
[paper][page][code] (ICCV 2021 oral)
Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
[paper][page][code] (ACM Graphics)
G3AN: Disentangling Appearance and Motion for Video Generation
ImaGINator: Conditional Spatio-Temporal GAN for Video Generation
[paper] (WACV)
2019
Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis
[paper] IJCAI
2018
MoCoGAN: Decomposing Motion and Content for Video Generation
[paper] CVPR
2017
TGAN: Temporal Generative Adversarial Nets with Singular Value Clipping
[paper] ICCV
2016
Generating Videos with Scene Dynamics
[paper] NeurIPS
Conditional
Temporally Consistent Semantic Video Editing
Video Representation
2022
Scalable Neural Video Representations with Learnable Positional Features
[paper][page][code] (NeurIPS 2022)
MCL: Motion-Focused Contrastive Learning of Video Representations
[paper] (ICCV 2022 oral)
2021
FAME: Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
[paper] CVPR
TAM: Temporal Adaptive Module for Video Recognition
[paper] ICCV
Self-supervised Video Representation Learning by Context and Motion Decoupling
[paper] CVPR
Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion
[paper] AAAI
Others
2022
3D-Aware Video Generation
Latent Image Animator: Learning to Animate Images via Latent Space Navigation
source to target, find latent direction
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
[paper] CVPR
bottom layer has redundancy, randomly drop gradients of spatial model