A comprehensive list of resources about text-guided generative models.
-
DALL-E - Hierarchical Text-Conditional Image Generation with CLIP Latents
-
Imagen - Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
-
Stable Diffusion - High-Resolution Image Synthesis with Latent Diffusion Models
-
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
-
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
-
DIFFEDIT: Diffusion-based Semantic Image Editing With Mask Guidance
-
UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image
-
LAFITE: Towards Language-Free Training for Text-to-Image Generation
-
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
-
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
-
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
-
Prompt-to-Prompt: Latent Diffusion and Stable Diffusion implementation
-
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
-
clip2latent: Text driven sampling of a pre-trained StyleGAN using denoising diffusion and CLIP
-
Stable Diffusion Notebooks: Image, Animation, Panorama, text-based real image editing
-
Make-A-Video: Text-to-Video Generation without Text-Video Data
-
Imagen Video: High Definition Video Generation With Diffusion Models
-
Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions
-
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
-
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
-
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion