- Speech Synthesis(TTS)
- Automatic Speech Recognition(ASR)
- Speech Enhancement
- Voice Conversion(VC)
- Melspectorgram to Waveform(Vocoder)
- Audio Generation
- Music Generation
-
Wavegrad2: Iterative refinement for text-to-speech synthesis (21.06), Chen et al. [pdf]
-
Diff-tts: A denoising diffusion model for text-to-speech (INTERSPEECH 2021), Jeong et al. [pdf]
-
Grad-tts: A diffusion probabilistic model for text-to-speech (ICML 2021), Popov et al. [pdf]
-
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (21.05), Liu et al. [pdf]
-
PRIORGRAD: IMPROVING CONDITIONAL DENOISING DIFFUSION MODELS WITH DATA-DEPENDENT ADAPTIVE PRIOR (ICLR 2022), Lee et al. [pdf]
-
Diffgan-tts: High-fidelity and efficient text-to-speech with denoising diffusion gans (22.01), Liu et al. [pdf]
-
BDDM: Bilateral denoising diffusion models for fast and high-quality speech synthesis (ICLR 2022), W.Y.Lam et al. [pdf]
-
Prodiff: Progressive fast diffusion model for high-quality text-to-speech (ACMMM 2022), Huang et al. [pdf]
-
Zero-shot voice conditioning for denoising diffusion tts models (22.06), Levkovitch et al. [pdf]
-
Fastdiff: A fast conditional diffusion model for high-quality speech synthesis (IJCAI 2022), Huang et al. [pdf]
-
FastDiff 2: Dually Incorporating GANs into Diffusion Models for High-Quality Speech Synthesis(22.09), Huang et al. [pdf]
-
Guided-tts: A diffusion model for text-to-speech via classifier guidance (ICML 2022), Kim et al. [pdf]
-
Guided-tts 2: A diffusion model for high-quality adaptive text-to-speech with untranscribed data (22.05), Kim et al. [pdf]
-
Prosody-TTS: Self-Supervised Prosody Pretraining with Latent Diffusion For Text-to-Speech (22.09), Huang et al. [pdf]
-
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models (22.10), Baas et al. [pdf]
-
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance (22.11), Kang et al. [pdf]
-
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models (22.11), Kang et al. [pdf]
-
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS (22.11), Yang et al. [pdf]
-
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech (22.12), Chen et al. [pdf]
-
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder (22.12), Yasuda et al. [pdf]
-
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt (23.01), Yang et al. [pdf]
-
AN INVESTIGATION INTO THE ADAPTABILITY OF A DIFFUSION-BASED TTS MODEL (23.03), Chen et al. [pdf]
-
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers (23.04), Shen et al. [pdf]
- TransFusion: Transcribing Speech with Multinomial Diffusion (22.10), Baas et al. [pdf]
-
Conditional diffusion probabilistic model for speech enhancement (ICASSP 2022), Lu et al. [pdf]
-
Universal speech enhancement with score-based diffusion (22.06), Serrà et al. [pdf]
-
Speech enhancement and dereverberation with diffusion-based generative models (22.08), Richter et al. [pdf]
-
Cold Diffusion for Speech Enhancement (22.11), Yen et al. [pdf]
- Diffsvc: A diffusion probabilistic model for singing voice conversion (ASRU 2021), Liu et al. [pdf]
- AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models (23.04), Wang et al. [pdf]