/speech-language-model

A collection of papers related to speech language models

Speech Language Modeling Papers

GitHub Repo stars Contributions welcome GitHub contributors

This repository is a collection of papers and learning resources related to speech language models. Please feel free to suggest more!

Learning resources

Papers

2023

  • [arXiv] [demo] SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
  • [arXiv] [code] SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
  • [arXiv] Prompting Large Language Models with Speech Recognition Abilities
  • [arXiv] [demo] AudioPaLM: A Large Language Model That Can Speak and Listen
  • [arXiv] [demo] Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
  • [arXiv] [demo] PolyVoice: Language Models for Speech to Speech Translation
  • [arXiv] [demo] Make-A-Voice: Unified Voice Synthesis With Discrete Representation
  • [arXiv] [demo] NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
  • [arXiv] [demo] [code] MusicGen: Simple and Controllable Music Generation
  • [arXiv] [demo] TWIST: Textually Pretrained Speech Language Models
  • [arXiv] [demo] SoundStorm: Efficient Parallel Audio Generation
  • [arXiv] VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
  • [arXiv] [demo] VALL-E X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
  • [arXiv] [demo] VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
  • [arXiv] [demo] SPEAR-TTS: Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
  • [arXiv] [demo] MusicLM: Generating Music From Text
  • [ICLR] [arXiv] [demo] AudioGen: Textually Guided Audio Generation
  • [ICASSP] [arXiv] SpeechLMScore: Evaluating Speech Generation Using Speech Language Model
  • [TACL] [demo] [code] dGSLM: Generative Spoken Dialogue Language Modeling
  • [TASLP] [arXiv] [demo] AudioLM: A Language Modeling Approach to Audio Generation

2022

  • [Interspeech][arXiv][code] SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks
  • [ACL] [demo] [code] pGSLM: Text-Free Prosody-Aware Generative Spoken Language Modeling

2021

  • [TACL] [demo] [code] GSLM: On Generative Spoken Language Modeling from Raw Audio