2118 |
Using Text Injection to Improve Recognition of Personal Identifiers in Speech |
➖ |
➖ |
837 |
Investigating Wav2Vec2 Context Representations and the Effects of Fine-tuning, a Case-study of a Finnish Model |
|
➖ |
872 |
Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech |
➖ |
➖ |
177 |
Iteratively Improving Speech Recognition and Voice Conversion |
|
|
2001 |
LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems |
➖ |
|
746 |
TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition |
➖ |
|
1124 |
Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR |
➖ |
➖ |
2417 |
GhostRNN: Reducing State Redundancy in RNN with Cheap Operations |
➖ |
➖ |
1442 |
Task-Agnostic Structured Pruning of Speech Representation Models |
➖ |
|
485 |
Factual Consistency Oriented Speech Recognition |
➖ |
|
1036 |
Multi-Head State Space Model for Speech Recognition |
➖ |
|
341 |
Cascaded Multi-task Adaptive Learning Based on Neural Architecture Search |
➖ |
➖ |
2359 |
Probing Self-supervised Speech Models for Phonetic and Phonemic Information: a Case Study in Aspiration |
➖ |
|
739 |
Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers |
➖ |
|
213 |
A More Accurate Internal Language Model Score Estimation for the Hybrid Autoregressive Transducer |
➖ |
➖ |
2280 |
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data |
➖ |
|
2585 |
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking |
➖ |
|
1316 |
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark |
|
|
2389 |
General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization |
➖ |
➖ |
275 |
Joint Instance Reconstruction and Feature Sub-space Alignment for Cross-Domain Speech Emotion Recognition |
➖ |
➖ |
106 |
Attention Gate between Capsules in Fully Capsule-network Speech Recognition |
➖ |
➖ |
1272 |
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition |
➖ |
|
1189 |
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers |
➖ |
|
223 |
Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding |
➖ |
|
923 |
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation |
|
|
2258 |
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts |
➖ |
|
1184 |
DCCRN-KWS: An Audio Bias based Model for Noise Robust Small-footprint Keyword Spotting |
➖ |
|
1609 |
OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition |
➖ |
|
2136 |
Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition |
➖ |
|
788 |
Rehearsal-Free Online Continual Learning for Automatic Speech Recognition |
|
|
496 |
ASR Data Augmentation in Low-resource Settings using Cross-lingual Multi-speaker TTS and Cross-lingual Voice Conversion |
|
|
642 |
Personality-aware Training based Speaker Adaptation for End-to-End Speech Recognition |
➖ |
➖ |
2257 |
Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences |
➖ |
➖ |
679 |
Wave to Syntax: Probing Spoken Language Models for Syntax |
|
|
720 |
Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR |
➖ |
|
630 |
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation |
➖ |
|
1118 |
SlothSpeech: Denial-of-service Attack Against Speech Recognition Models |
|
|
503 |
CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition |
➖ |
➖ |
159 |
Exploring Sources of Racial Bias in Automatic Speech Recognition through the Lens of Rhythmic Variation |
➖ |
➖ |
1440 |
Can Contextual Biasing Remain Effective with Whisper and GPT-2? |
➖ |
|
221 |
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation |
|
|
2207 |
Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition |
➖ |
➖ |
1216 |
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition |
➖ |
➖ |
1192 |
Improving Chinese Mandarin Speech Recognition using Graph Embedding Regularization |
➖ |
➖ |
1276 |
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers |
➖ |
|
1221 |
Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition |
➖ |
➖ |
1010 |
Model-Internal Slot-triggered Biasing for Domain Expansion in Neural Transducer ASR Models |
➖ |
|
2508 |
Delay-penalized CTC implemented based on Finite State Transducer |
|
|
101 |
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition |
|
|
1064 |
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations |
|
|
1422 |
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator |
➖ |
|
1413 |
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification |
|
|
2589 |
Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR |
|
|
1091 |
Domain Adaptive Self-supervised Training of Automatic Speech Recognition |
➖ |
➖ |
1105 |
There is more than One Kind of Robustness: Fooling Whisper with Adversarial Examples |
|
|
1176 |
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute |
➖ |
|
759 |
Blank-regularized CTC for Frame Skipping in Neural Transducer |
➖ |
|
2406 |
The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR |
➖ |
|
2354 |
Improving RNN-Transducers with Acoustic LookAhead |
➖ |
|
1847 |
Everyone has an Accent |
➖ |
➖ |
2124 |
Some Voices are too Common: Building Fair Speech Recognition Systems using the Common-Voice Dataset |
➖ |
|
1168 |
Information Magnitude Based Dynamic Sub-sampling for Speech-to-text |
➖ |
➖ |
353 |
Towards Multi-task Learning of Speech and Speaker Recognition |
|
|
2186 |
Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR |
➖ |
➖ |
1012 |
2-bit Conformer Quantization for Automatic Speech Recognition |
➖ |
|
167 |
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition |
➖ |
|
257 |
Multi-channel Multi-speaker Transformer for Speech Recognition |
➖ |
➖ |
733 |
Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion |
➖ |
|
2463 |
Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR |
➖ |
➖ |
767 |
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network |
➖ |
|
970 |
Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think |
➖ |
|
791 |
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition |
➖ |
|
2499 |
Biased Self-supervised Learning for ASR |
➖ |
|
1300 |
A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions |
➖ |
➖ |
2470 |
Wav2Vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting |
➖ |
➖ |
770 |
BAT: Boundary Aware Transducer for Memory-efficient and Low-latency ASR |
➖ |
|
1342 |
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction |
➖ |
➖ |
783 |
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition |
➖ |
|