Vision | ||||
To learn image super-resolution, use a GAN to learn how to do image degradation first | ||||
Feature Perceptual Loss for Variational Autoencoder | - Autoencoder - Loss function |
|||
Context Encoder | Context Encoders: Feature Learning by Inpainting | - Self-supervised vision representation learning - Image inpainting |
||
Fixing the train-test resolution discrepancy | ||||
GANs | Generative Adversarial Nets | - GANs | ||
ImageGPT | Generative Pretraining from Pixels | - Self-supervised vision representation learning | ||
Deformable ConvNets v2: More Deformable, Better Results | - CNN | |||
Deformable Convolutional Networks | - CNN | |||
2023 | ControlNet | Adding Conditional Control to Text-to-Image Diffusion Models | - Transformer - Diffusion |
|
BEIT | BEIT: BERT Pre-Training of Image Transformers | - Self-supervised vision representation learning | ||
Diffusion Illusion | Diffusion Illusions: Hiding Images in Plain Sight | - Diffusion - Illusion |
||
LVDM | Latent Video Diffusion Models for High-Fidelity Long Video Generation | - VSR-Diffusion | ||
Understanding Deformable Alignment in Video Super-Resolution | - VSR- - Deformable convolution |
|||
Towards Accurate Generative Models of Video: A New Metric & Challenges | - Metric | |||
Vision | Image Generation | |||
VQ-VAE-2 | Generating Diverse High-Fidelity Images with VQ-VAE-2 | - Image generation - GANs |
||
VQGAN | Taming Transformers for High-Resolution Image Synthesis | - Image generation - GANs |
||
CDM | Cascaded Diffusion Models for High Fidelity Image Generation | - Image generation | ||
Consistency Models | - Image generation | |||
DiffiT | DiffiT: Diffusion Vision Transformers for Image Generation | - Image generation - Transformer |
||
Emu | Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack - Image generation | |||
Vision | SR | |||
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach | - ISR | |||
Video LDM | Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | - Video generation | ||
SwinIR: Image Restoration Using Swin Transformer | - ISR - Transformer |
|||
Blind Super-Resolution Kernel Estimation using an Internal-GAN | - ISR - GANs |
|||
BasicVSR | BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond | - VSR | ||
BasicVSR++ | BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment | - VSR | ||
SR3 | Image Super-Resolution via Iterative Refinement | - ISR - Diffusion |
||
SR3+ | Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild | - ISR - Diffusion |
||
Designing a Practical Degradation Model for Deep Blind Image Super-Resolution | - BISR | |||
DiffBIR | DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior | - BISR | ||
MoESR | Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach | - ISR - Diffusion |
||
LIIF | Learning Continuous Image Representation with Local Implicit Image Function | - Continuous super-resolution | ||
Implicit Diffusion Models for Continuous Super-Resolution | - Continuous super-resolution | |||
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion | ||||
Model and Implicit Neural Decoder | - Continuous super-resolution | |||
Vision-Language | ||||
2022 | Flamingo | Flamingo: a Visual Language Model for Few-Shot Learning | - Transformer | |
VideoGPT | VideoGPT: Video Generation using VQ-VAE and Transformers | |||
Video Diffusion Models | ||||
Vision-Language | Text-to-Image Generation | |||
Dall-E 3 | Improving Image Generation with Better Captions | - Text-to-image generation | ||
FIFO | FIFO-Diffusion: Generating Infinite Videos from Text without Training | - Text-to-image generation |