1173 |
Robust Prototype Learning for Anomalous Sound Detection |
➖ |
➖ |
982 |
A Multimodal Prototypical Approach for Unsupervised Sound Classification |
|
|
563 |
Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms |
➖ |
➖ |
1082 |
Adapting Language-Audio Models as Few-Shot Audio Learners |
➖ |
|
914 |
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention |
|
|
734 |
TFECN: Time-Frequency Enhanced ConvNet for Audio Classification |
➖ |
➖ |
350 |
Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection |
➖ |
➖ |
1174 |
Fine-Tuning Audio Spectrogram Transformer with Task-Aware Adapters for Sound Event Detection |
➖ |
➖ |
1210 |
Small Footprint Multi-Channel Network for Keyword Spotting with Centroid Based Awareness |
➖ |
➖ |
1380 |
Few-Shot Class-Incremental Audio Classification using Adaptively-Refined Prototypes |
➖ |
|
1549 |
Interpretable Latent Space using Space-Filling Curves for Phonetic Analysis in Voice Conversion |
|
|
1861 |
Topological Data Analysis for Speech Processing |
|
|
1329 |
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation |
|
|
932 |
Personalized Acoustic Scene Classification in Ultra-Low Power Embedded Devices using Privacy-Preserving Data Augmentation |
➖ |
➖ |
176 |
Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection |
➖ |
➖ |
1021 |
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning |
|
|
2416 |
Anomalous Sound Detection using Self-Attention-based Frequency Pattern Analysis of Machine Sounds |
➖ |
➖ |
1478 |
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions |
➖ |
➖ |
979 |
Ontology-aware Learning and Evaluation for Audio Tagging |
|
|
575 |
Differential Privacy enabled Dementia Classification: An Exploration of the Privacy-Accuracy Trade-off in Speech Signal Data |
➖ |
➖ |
1595 |
Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech |
|
|
1816 |
Towards Multi-Lingual Audio Question Answering |
|
➖ |
477 |
Wav2ToBI: A New Approach to Automatic ToBI Transcription |
➖ |
➖ |
1579 |
MCR-Data2vec 2.0: Improving Self-Supervised Speech Pre-training via Model-Level Consistency Regularization |
➖ |
|
591 |
Anomalous Sound Detection based on Sound Separation |
➖ |
|
2089 |
Random Forest Classification of Breathing Phases from Audio Signals Recorded using Mobile Devices |
➖ |
➖ |
1581 |
GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos |
➖ |
➖ |
358 |
Emotion-aware Audio-Driven Face Animation via Contrastive Feature Disentanglement |
➖ |
➖ |
344 |
Joint-Former: Jointly Regularized and Locally Down-Sampled Conformer for Semi-Supervised Sound Event Detection |
➖ |
➖ |
245 |
Towards Attention-based Contrastive Learning for Audio Spoof Detection |
➖ |
➖ |
2488 |
Masked Audio Modeling with CLAP and Multi-Objective Learning |
➖ |
➖ |
1904 |
Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems |
|
|
481 |
Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-Spoofing |
➖ |
➖ |
491 |
Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASR |
➖ |
|
684 |
Multi-Microphone Automatic Speech Segmentation in Meetings based on Circular Harmonics Features |
➖ |
|
542 |
Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection |
➖ |
➖ |
88 |
Insights Into End-to-End Audio-to-Score Transcription with Real Recordings: A Case Study with Saxophone Works |
➖ |
➖ |
2193 |
Whisper-AT: Noise-Robust Automatic Speech Recognizers are also Strong Audio Event Taggers |
|
|
1621 |
Synthetic Voice Spoofing Detection based on Feature Pyramid Conformer |
➖ |
➖ |
1383 |
Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection |
➖ |
➖ |
2011 |
Application of Knowledge Distillation to Multi-Task Speech Representation Learning |
➖ |
|
2297 |
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes |
➖ |
|
1965 |
Variational Classifier for Unsupervised Anomalous Sound Detection under Domain Generalization |
➖ |
➖ |
745 |
FlexiAST: Flexibility is What AST Needs |
|
|
1344 |
Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network |
➖ |
|
613 |
Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement |
➖ |
➖ |
1431 |
An Efficient Speech Separation Network based on Recurrent Fusion Dilated Convolution and Channel Attention |
➖ |
|
801 |
Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation |
➖ |
➖ |
2015 |
Binaural Sound Localization in Noisy Environments using Frequency-based Audio Vision Transformer (FAViT) |
➖ |
➖ |
1723 |
Contrastive Learning based Deep Latent Masking for Music Source Separation |
➖ |
➖ |
655 |
Speaker Extraction with Detection of Presence and Absence of Target Speakers |
➖ |
➖ |
889 |
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network |
➖ |
➖ |
2117 |
Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning |
➖ |
|
1309 |
Image-Driven Audio-Visual Universal Source Separation |
➖ |
➖ |
2520 |
Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource |
➖ |
➖ |
1766 |
SDNet: Stream-Attention and Dual-Feature Learning Network for Ad-hoc Array Speech Separation |
➖ |
➖ |
2451 |
Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization |
➖ |
➖ |
164 |
Multi-Channel Separation of Dynamic Speech and Sound Events |
|
➖ |
2545 |
Rethinking the Visual Cues in Audio-Visual Speaker Extraction |
|
|
85 |
Using Semi-Supervised Learning for Monaural Time-Domain Speech Separation with a Self-Supervised Learning-based SI-SNR Estimator |
➖ |
➖ |
1158 |
Investigation of Training Mute-Expressive End-to-End Speech Separation Networks for an Unknown Number of Speakers |
➖ |
➖ |
2369 |
SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking |
➖ |
➖ |
165 |
Time-Frequency Domain Filter-and-Sum Network for Multi-Channel Speech Separation |
➖ |
➖ |
714 |
FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization |
|
|
696 |
A Neural State-Space Modeling Approach to Efficient Speech Separation |
➖ |
|
1777 |
Locate and Beamform: Two-Dimensional Locating All-Neural Beamformer for Multi-Channel Speech Separation |
|
|
518 |
Monaural Speech Separation Method based on Recurrent Attention with Parallel Branches |
➖ |
➖ |
951 |
What do Self-Supervised Speech Representations Encode? An Analysis of Languages, Varieties, Speaking Styles and Speakers |
➖ |
➖ |
1696 |
A Compressed Synthetic Speech Detection Method with Compression Feature Embedding |
➖ |
➖ |
572 |
Outlier-aware Inlier Modeling and Multi-Scale Scoring for Anomalous Sound Detection via Multitask Learning |
➖ |
➖ |
263 |
MOSLight: A Lightweight Data-Efficient System for Non-Intrusive Speech Quality Assessment |
➖ |
➖ |
1626 |
A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation |
|
|
2494 |
MTANet: Multi-band Time-Frequency Attention Network for Singing Melody Extraction from Polyphonic Music |
➖ |
➖ |
119 |
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer based on Generative Adversarial Network |
|
|
2190 |
Do Vocal Breath Sounds Encode Gender cues for Automatic Gender Classification? |
➖ |
➖ |
202 |
Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation using Improved Differentiable Automatic Data Augmentation |
➖ |
➖ |
1430 |
A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis |
➖ |
➖ |
528 |
RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music |
|
|
832 |
Spatialization Quality Metric for Binaural Speech |
➖ |
➖ |
428 |
AsthmaSCELNet: A Lightweight Supervised Contrastive Embedding Learning Framework for Asthma Classification using Lung Sounds |
➖ |
➖ |
1426 |
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification |
|
|
2115 |
Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance |
➖ |
|
852 |
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation |
|
|
209 |
Obstructive Sleep Apnea Screening with Breathing Sounds and Respiratory Effort: A Multimodal Deep Learning Approach |
➖ |
➖ |
2275 |
Investigation of Music Emotion Recognition based on Segmented Semi-Supervised Learning |
➖ |
➖ |