audio-generation-papers: A repository from RevoSpeech Technology

Audio Generation Papers

Recent audio generation (and audio codec) papers, including speech, music and general audios.

Year	Org.	Name	Title	Paper	Demo	Code
2020	OpenAI	Jukebox	Jukebox: A Generative Model for Music	[2005.00341]	[demo]	[code]
2021	Google	Soundstream	Soundstream: An end-to-end neural audio codec	[2107.03312]	[demo]	[code] [code]
2021	IRCAM	RAVE	RAVE: A variational autoencoder for fast and high-quality neural audio synthesis	[2111.05011]	[demo]	[code]
2022	Google	Perceiver-AR	General-purpose, long-context autoregressive modeling with Perceiver AR	[2202.07765]	[demo]	[code] [code]
2022	Stanford	SASHIMI	It's raw! audio generation with state-space models	[2202.09729]	[demo]	[code]
2022	Baidu	A3T	A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing	[2203.09690]	[demo]	[code]
2022	SJTU	VQTTS	VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature	[2204.00768]	[demo]	[code]
2022	Google	Spectrogram Diffusion	Multi-instrument Music Synthesis with Spectrogram Diffusion	[2206.05408]	[demo]	-
2022	Microsoft	DelightfulTTS 2	DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders	[2207.04646]	[demo]	-
2022	Google	MuLan	Mulan: A joint embedding of music audio and natural language	[2208.12415]	-	[code]
2022	Google	AudioLM	AudioLM: a Language Modeling Approach to Audio Generation	[2209.03143]	[demo]	[code]
2022	Meta AI	AudioGen	AudioGen: Textually Guided Audio Generation	[2209.15352]	[demo]	-
2022	Microsoft	Museformer	Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation	[2210.10349]	[demo]	[code]
2022	Meta AI	Encodec	High Fidelity Neural Audio Compression	[2210.13438]	[demo]	[code]
2022	Meta AI	Modified AudioGen	Audio Language Modeling using Perceptually-Guided Discrete Representations	[2211.01223]	-	-
2022	Baidu	ERNIE-SAT	ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech	[2211.03545]	[demo]	[code]
2023	Microsoft	PromptTTS	PromptTTS: Controllable Text-to-Speech with Text Descriptions	[2211.12171]	[demo]	-
2023	Microsoft	VALL-E	Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers	[2301.02111]	[demo]	[code]
2023	-	Msanii	Msanii: High Fidelity Music Synthesis on a Shoestring Budget	[2301.06468]	-	[code]
2023	Google	MusicLM	MusicLM: Generating Music From Text	[2301.11325]	[demo]	[code]
2023	ETH	Moûsai	Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion	[2301.11757]	[demo]	[code]
2023	CVSSP	AudioLDM	AudioLDM: Text-to-Audio Generation with Latent Diffusion Models	[2301.12503]	[demo]	[code]
2023	ByteDance	Make-An-Audio	Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models	[2301.12661]	[demo]	-
2023	Google	SingSong	SingSong: Generating musical accompaniments from singing	[2301.12662]	[demo]	-
2023	ETH	ArchiSound	ArchiSound: Audio Generation with Diffusion	[2301.13267]	[demo]	[code]
2023	Tencent	InstructTTS	InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt	[2301.13662]	[demo]	-
2023	Sapienza University	MSDM	Multi-Source Diffusion Models for Simultaneous Music Generation and Separation	[2302.02257]	[demo]	[code]
2023	Google	SPEAR-TTS	Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision	[2302.03540]	[demo]	[code]
2023	Google	Noise2Music	Noise2Music: Text-conditioned Music Generation with Diffusion Models	[2302.03917]	[demo]	-
2023	CMU	MQTTS	A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech	[2302.04215]	[demo]	[code]
2023	Baidu	ERNIE-Music	ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models	[2302.04456]	-	-
2023	Microsoft	FoundationTTS	FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model	[2303.02939]	[demo]	-
2023	Microsoft	VALL-EX	Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling	[2303.03926]	[demo]	-

RevoSpeechTech/audio-generation-papers

Audio Generation Papers