INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. ⭐ the repository to support the advancement of speech technology!
The PDF version of the INTERSPEECH 2023 Conference Programme, comprises a list of all accepted full papers, their presentation order, as well as the designated presentation times.
Other collections of the best AI conferences
❗ Conference table will be up to date all the time.
Conference | Year |
Computer Vision (CV) | |
CVPR | 2023 |
Speech (SP) | |
ICASSP | 2023 |
Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.
❗ Final paper links will be added post-conference.
List of sections
- Resources for Spoken Language Processing
- Speech Synthesis: Prosody and Emotion
- Statistical Machine Translation
- Self-Supervised Learning in ASR
- Prosody
- Speech Production
- Dysarthric Speech Assessment
- Speech Coding: Transmission
- Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation
- Analysis of Speech and Audio Signals
- Speech Recognition: Architecture, Search, and Linguistic Components
- Speech Recognition: Technologies and Systems for New Applications
- Lexical and Language Modeling for ASR
- Language Identification and Diarization
- Speech Quality Assessment
- Feature Modeling for ASR
- Interfacing Speech Technology and Phonetics
- Speech Synthesis: Multilinguality
- Speech Emotion Recognition
- Spoken Dialog Systems and Conversational Analysis
- Speech Coding and Enhancement
- Paralinguistics
- Speech Enhancement and Denoising
- Speech Synthesis: Evaluation
- End-to-End Spoken Dialog Systems
- Biosignal-enabled Spoken Communication
- Neural-based Speech and Acoustic Analysis
- DiGo - Dialog for Good: Speech and Language Technology for Social Good
- Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources, and Evaluation
- Speech, Voice, and Hearing Disorders
- Spoken Term Detection and Voice Search
- Models for Streaming ASR
- Source Separation
- Speech Perception
- Phonetics and Phonology: Languages and Varieties
- Speaker and Language Identification
- Speech Synthesis and Voice Conversion
- Speech and Language in Health: from Remote Monitoring to Medical Conversations
- Novel Transformer Models for ASR
- Speaker Recognition
- Cross-lingual and Multilingual ASR
- Voice Conversion
- Pathological Speech Analysis
- Multimodal Speech Emotion Recognition
- Phonetics, Phonology, and Prosody
- Speech Coding: Privacy
- Analysis of Neural Speech Representations
- End-to-end ASR
- Spoken Language Understanding, Summarization, and Information Retrieval
- Invariant and Robust Pre-trained Acoustic Models
- Speech Synthesis: Representation Learning
- Speech Perception, Production, and Acquisition
- Acoustic Model Adaptation for ASR
- Speech Synthesis: Expressivity
- Multi-modal Systems
- Question Answering from Speech
- Multi-talker Methods in Speech Processing
- Sociophonetics
- Speaker and Language Diarization
- Anti-Spoofing for Speaker Verification
- Speech Coding: Intelligibility
- New Computational Strategies for ASR Training and Inference
- MERLIon CCS Challenge: Multilingual Everyday Recordings - Language Identification On Code-Switched Child-Directed Speech
- Health-Related Speech Analysis
- Automatic Audio Classification and Audio Captioning
- Speech Synthesis
- Speech Synthesis: Controllability and Adaptation
- Search Methods and Decoding Algorithms for ASR
- Speech Signal Analysis
- Connecting Speech-science and Speech-technology for Children's Speech
- Dialog Management
- Speech Activity Detection and Modeling
- Multilingual Models for ASR
- Speech Enhancement and Bandwidth Expansion
- Articulation
- Neural Processing of Speech and Language: Encoding and Decoding the Diverse Auditory Brain
- Perception of Paralinguistics
- Technologies for Child Speech Processing
- Speech Synthesis: Multilinguality; Evaluation
- Show and Tell: Health Applications and Emotion Recognition
- Show and Tell: Speech Tools, Speech Enhancement, Speech Synthesis
- Show and Tell: Language Learning and Educational Resources
- Show and Tell: Media and Commercial Applications
Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources, and Evaluation
🆔 | Title | Repo | Paper |
---|---|---|---|
1922 | A Neural Architecture for Selective Attention to Speech Features | ➖ | ➖ |
1122 | Quantifying Informational Masking due to Masker Intelligibility in Same-Talker Speech-in-Speech Perception | ➖ | ➖ |
1476 | On the Benefits of Self-Supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions | ➖ | ➖ |
2154 | Predicting Perceptual Centers Located at Vowel Onset in German Speech using Long Short-Term Memory Networks | ➖ | ➖ |
63 | Exploring the Mutual Intelligibility Breakdown Caused by Sculpting Speech from a Competing Speech Signal | ➖ | ➖ |
2103 | Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese | ➖ | ➖ |
🆔 | Title | Repo | Paper |
---|---|---|---|
1879 | The Emergence of Obstruent-Intrinsic f0 and VOT as Cues to the Fortis/Lenis Contrast in West Central Bavarian | ➖ | ➖ |
431 | 〈'〉 in Tsimane': A Preliminary Investigation | ➖ | ➖ |
2200 | Segmental Features of Brazilian (Santa Catarina) Hunsrik | ➖ | ➖ |
2337 | Opening or Closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English | ➖ | ➖ |
295 | Increasing Aspiration of Word-Medial Fortis Plosives in Swiss Standard German | ➖ | ➖ |
1456 | Lexical Stress and Velar Palatalization in Italian: A Spatio-Temporal Interaction | ➖ | ➖ |
🆔 | Title | Repo | Paper |
---|---|---|---|
1832 | LanSER: Language-Model Supported Speech Emotion Recognition | ➖ | ➖ |
463 | Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition | ➖ | ➖ |
1591 | Emotion Label Encoding using Word Embeddings for Speech Emotion Recognition | ➖ | ➖ |
2444 | Discrimination of the Different Intents Carried by the Same Text through Integrating Multimodal Information | ➖ | ➖ |
510 | Meta-Domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-Sentiment Predictions | ➖ | ➖ |
413 | SWRR: Feature Map Classifier based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition | ➖ | ➖ |
🆔 | Title | Repo | Paper |
---|---|---|---|
206 | Aberystwyth English Pre-Aspiration in Apparent Time | ➖ | ➖ |
1154 | Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role | ➖ | ➖ |
1414 | Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland | ➖ | ➖ |
1704 | Vowel Normalisation in Latent Space for Sociolinguistics | ➖ | ➖ |
MERLIon CCS Challenge: Multilingual Everyday Recordings - Language Identification On Code-Switched Child-Directed Speech
🆔 | Title | Repo | Paper |
---|---|---|---|
2038 | Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings | ➖ | ➖ |
1668 | The Effect of Clinical Intervention on the Speech of Individuals with PTSD: Features and Recognition Performances | ➖ | ➖ |
470 | Analysis and Automatic Prediction of Exertion from Speech: Contrasting Objective and Subjective Measures Collected while Running | ➖ | ➖ |
894 | The Androids Corpus: A New Publicly Available Benchmark for Speech based Depression Detection | ➖ | ➖ |
658 | Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation | ➖ | ➖ |
839 | Acoustic Characteristics of Depression in Older Adults' Speech: the Role of Covariates | ➖ | ➖ |