This repository is a collection of papers and learning resources related to speech language models. Please feel free to suggest more!
- Blog posts from Dr. Hongyu Gong
- [arXiv] [demo] SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
- [arXiv] [code] SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
- [arXiv] Prompting Large Language Models with Speech Recognition Abilities
- [arXiv] [demo] AudioPaLM: A Large Language Model That Can Speak and Listen
- [arXiv] [demo] Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
- [arXiv] [demo] PolyVoice: Language Models for Speech to Speech Translation
- [arXiv] [demo] Make-A-Voice: Unified Voice Synthesis With Discrete Representation
- [arXiv] [demo] NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
- [arXiv] [demo] [code] MusicGen: Simple and Controllable Music Generation
- [arXiv] [demo] TWIST: Textually Pretrained Speech Language Models
- [arXiv] [demo] SoundStorm: Efficient Parallel Audio Generation
- [arXiv] VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
- [arXiv] [demo] VALL-E X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
- [arXiv] [demo] VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
- [arXiv] [demo] SPEAR-TTS: Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
- [arXiv] [demo] MusicLM: Generating Music From Text
- [ICLR] [arXiv] [demo] AudioGen: Textually Guided Audio Generation
- [ICASSP] [arXiv] SpeechLMScore: Evaluating Speech Generation Using Speech Language Model
- [TACL] [demo] [code] dGSLM: Generative Spoken Dialogue Language Modeling
- [TASLP] [arXiv] [demo] AudioLM: A Language Modeling Approach to Audio Generation