/audio-generation-papers

recent audio generation papers (including speech, music and general audios)

Apache License 2.0Apache-2.0

Audio Generation Papers

Recent audio generation (and audio codec) papers, including speech, music and general audios.

Year Org. Name Title Paper Demo Code
2020 OpenAI Jukebox Jukebox: A Generative Model for Music [2005.00341] [demo] [code]
2021 Google Soundstream Soundstream: An end-to-end neural audio codec [2107.03312] [demo] [code]
[code]
2021 IRCAM RAVE RAVE: A variational autoencoder for fast and high-quality neural audio synthesis [2111.05011] [demo] [code]
2022 Google Perceiver-AR General-purpose, long-context autoregressive modeling with Perceiver AR [2202.07765] [demo] [code]
[code]
2022 Stanford SASHIMI It's raw! audio generation with state-space models [2202.09729] [demo] [code]
2022 Baidu A3T A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing [2203.09690] [demo] [code]
2022 SJTU VQTTS VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature [2204.00768] [demo] [code]
2022 Google Spectrogram Diffusion Multi-instrument Music Synthesis with Spectrogram Diffusion [2206.05408] [demo] -
2022 Microsoft DelightfulTTS 2 DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders [2207.04646] [demo] -
2022 Google MuLan Mulan: A joint embedding of music audio and natural language [2208.12415] - [code]
2022 Google AudioLM AudioLM: a Language Modeling Approach to Audio Generation [2209.03143] [demo] [code]
2022 Meta AI AudioGen AudioGen: Textually Guided Audio Generation [2209.15352] [demo] -
2022 Microsoft Museformer Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation [2210.10349] [demo] [code]
2022 Meta AI Encodec High Fidelity Neural Audio Compression [2210.13438] [demo] [code]
2022 Meta AI Modified AudioGen Audio Language Modeling using Perceptually-Guided Discrete Representations [2211.01223] - -
2022 Baidu ERNIE-SAT ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech [2211.03545] [demo] [code]
2023 Microsoft PromptTTS PromptTTS: Controllable Text-to-Speech with Text Descriptions [2211.12171] [demo] -
2023 Microsoft VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers [2301.02111] [demo] [code]
2023 - Msanii Msanii: High Fidelity Music Synthesis on a Shoestring Budget [2301.06468] - [code]
2023 Google MusicLM MusicLM: Generating Music From Text [2301.11325] [demo] [code]
2023 ETH Moûsai Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion [2301.11757] [demo] [code]
2023 CVSSP AudioLDM AudioLDM: Text-to-Audio Generation with Latent Diffusion Models [2301.12503] [demo] [code]
2023 ByteDance Make-An-Audio Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models [2301.12661] [demo] -
2023 Google SingSong SingSong: Generating musical accompaniments from singing [2301.12662] [demo] -
2023 ETH ArchiSound ArchiSound: Audio Generation with Diffusion [2301.13267] [demo] [code]
2023 Tencent InstructTTS InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt [2301.13662] [demo] -
2023 Sapienza University MSDM Multi-Source Diffusion Models for Simultaneous Music Generation and Separation [2302.02257] [demo] [code]
2023 Google SPEAR-TTS Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision [2302.03540] [demo] [code]
2023 Google Noise2Music Noise2Music: Text-conditioned Music Generation with Diffusion Models [2302.03917] [demo] -
2023 CMU MQTTS A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech [2302.04215] [demo] [code]
2023 Baidu ERNIE-Music ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models [2302.04456] - -
2023 Microsoft FoundationTTS FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model [2303.02939] [demo] -
2023 Microsoft VALL-EX Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling [2303.03926] [demo] -