/minddiffusion

A collection of diffusion models based on MindSpore

Primary LanguagePythonApache License 2.0Apache-2.0

Awesome MindDiffusion Models

Introduction

This repo is an open source collection of a series of classic and new SoTA diffusion models based on MindSpore. We also provide the awesome list of diffusion models.

Released now

These models are implemented in MindSpore and run on Ascend.

Model Name Task Link
Taichu-GLIDE Vision - Text To Image Link
Wukong-Huahua Vision - Text To Image Link
Stable-Diffusionv2 Vision - Text To Image Link

Awesome Model List

Model Paper Institution Date Conference Support
DDPM DDPM: Denoising Diffussion Probalisitic Model UC Berkeley Jun 2020 NeurIPS 2020 To do
Vision - Image generation
Improved diffusion Improved Denoising Diffusion Probabilistic Models OpenAI Feb 2021 PMLR 2021
Guided diffusion Diffusion Models Beat Gans on Image Synthesis OpenAI Apr 2021 NeurIPS 2021
ADM Diffusion Models Beat GANs on Image Synthesis OpenAI Apr 2021 NeurIPS 2021
FastDPM On Fast Sampling of Diffusion Probabilistic Models NVIDIA May 2021 ICLR Workshop 2021
LSGM Score-based Generative Modeling in Latent Space NVIDIA Jun 2021 NeurIPS 2021
Distilled-DM Progressive Distillation for Fast Sampling of Diffusion Models Google Brain Feb 2022 ICLR 2022
GGDM Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality Google Brain Feb 2022 ICLR 2022
Vision - Text to Image
Stable Diffusion/LDM High-Resolution Image Synthesis with Latent Diffusion Models Stability.AI Dec 2021
Glide Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models OpenAI Dec 2021
Dalle-2 Hierarchical Text-conditional Image Generation with Clip Latents OpenAI Apr 2022
KNN Diffusion Image Generation via Large-Scale Retrieval Meta AI Apr 2022
Imagen Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding Google Brain May 2022
LAION-RDM Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models Ludwig-Maximilian University of Munich Jul 2022
DreamBooth DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation Google Research Aug 2022
DreamFusion DreamFusion: Text-to-3D using 2D Diffusion Google Research 29 Sep 2022
Vision - Image Editing
SDEdit SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Stanford U & CMU Aug 2021 ICLR 2022
RePaint RePaint: Inpainting using Denoising Diffusion Probabilistic Models ETH Zurich Jan 2022 CVPR 2022
Vision - Video Genereation
Video diffusion models Video diffusion models Google Brain Apr 2022 ICLR 2022 Workshop
MCVD MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation University of Montreal May 2022
Make-A-Video Make-A-Video: Text-to-Video Generation without Text-Video Data Meta AI 29 Sep 2022
Imagen Video Imagen Video: High Definition Video Generation with Diffusion Models Google Brain 5 Oct 2022
Natural language
Diffusion-LM Diffusion-LM Improves Controllable Text Generation Stanford University May 2022
Audio - Audio Generation
DiffWave DiffWave: A Versatile Diffusion Model for Audio Synthesis Nvidia & Baidu Jun 2020 ISMIR 2021
WaveGrad WaveGrad: Estimating Gradients for Waveform Generation Google Brain Sep 2020 ICLR 2021
Symbolic Music Generation Symbolic Music Generation with Diffusion Models Google Brain Mar 2021 ISMIR 2021
DiffSinger DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism Zhejiang University May 2021 AAAI 2022
VDM Variational Diffusion Models Google Brain Jul 2021 NeurIPS 2021
FastDiff FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Tencent AI Lab Apr 2022 IJCAI 2022
BDDMs BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis Tencent AI Lab May 2022 ICLR 2022
SawSing DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation AUG 2022 ISMIR 2022
Prodiff ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Zhejiang University JUL 2022 ACM Multimedia 2022
Audio - Audio Conversion
DiffVC Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme Huawei Noah Sep 2021 ICLR 2022
Audio - Audio Enhancement
NU-Wave NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling MINDSLAB Apr 2021 Interspeech 2021
CDiffSE Conditional Diffusion Probabilistic Model for Speech Enhancement CMU Feb 2022 IEEE 2022
Audio - Text to Speech
Grad-TTS Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech Huawei Noah May 2021
EdiTTS EdiTTS: Score-based Editing for Controllable Text-to-Speech Yale University Oct 2021
DiffGAN-TTS DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs Tencent AI Lab Jan 2022
Diffsound Diffsound: Discrete Diffusion Model for Text-to-sound Generation Tencent AI Lab Jul 2022

Contributing

We welcom all contributions to improve this project! Please fork this repo and submit a pull request to contribute your diffusion models.