/Awesome-Sound-Source-Localization

A tutorial for Sound Source Localization researchers and practitioners. The purpose of this repo is to organize the world’s resources for Sound Source Localization, and make them universally accessible and useful.

MIT LicenseMIT

Awesome Sound Source Localization

Table of contents

Overview

This is a curated list of Awesome Sound Source Localization tutorials, papers, libraries, datasets, tools, scripts and results. The purpose of this repo is to organize the world’s resources for Sound Source Localization, and make them universally accessible and useful.

To add items to this page, you are welcomed to simply issue a Pull Request.

Publications

Survey

  • A Survey of Sound Source Localization with Deep Learning Methods, The Journal of the Acoustical Society of America, 2022 [paper]

Databases

  • SLoClas: A Database for Joint Sound Localization and Classification, 2021 [paper] [note]
  • The LOCATA Challenge: Acoustic Source Localization and Tracking, TASLP 2020 [paper]

Network design

MLP

CNN

  • Deep Neural Networks for Multiple Speaker Detection and Localization, ICRA 2018 [paper] [code] [note]
  • Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network, Interspeech2018 [paper]
  • Adaptation of Multiple Sound Source Localization Neural Networks with Weak Supervision and Domain-adversarial Training, ICASSP 2019 [paper] [code]
  • Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation, TASLP 2021 [paper] [code] [note]
  • Broadband DOA estimation using Convolutional neural networks trained with noise signals, 2017 [paper] [note]
  • Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals, JSTSP 2019 [paper] [note]
  • Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network, TASLP 2020 [paper] [note]

RNN & LSTM & GRU

  • Time Difference of Arrival Estimation of Speech Signals Using Deep Neural Networks with Integrated Time-frequency Masking, ICASSP 2019 [paper] [note]

CRNN

  • Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network, EUSIPCO 2018 [paper] [note]
  • Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks, JSTSP 2018 [paper] [note] [code]
  • CRNN-Based Multiple DoA Estimation Using Acoustic Intensity Features for Ambisonics Recordings, 2019 [paper] [note]

Attention

  • A combination of various neural networks for sound event localization and detection, DCASE 2021 Challenge
  • Sound event localization and detection using cross-modal attention and parameter sharing, DCASE 2021 Challenge

Encoder-decoder neural networks

  • PILOT: introducing Transformers for probabilistic sound event localization, Interspeech 2021 [paper]

Learning strategy

Loss function

MSE
  • Deep Neural Networks for Multiple Speaker Detection and Localization, ICRA 2018 [paper] [code] [note]
  • Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network, Interspeech2018 [paper]
  • Adaptation of Multiple Sound Source Localization Neural Networks with Weak Supervision and Domain-adversarial Training, ICASSP 2019 [paper] [code]
  • Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation, TASLP 2021 [paper] [code] [note]
  • Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network, TASLP 2020 [paper] [note]
Cross entropy
  • Broadband DOA estimation using Convolutional neural networks trained with noise signals, 2017 [paper] [note]
  • Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals, JSTSP 2019 [paper] [note]
  • SLoClas: A Database for Joint Sound Localization and Classification, 2021 [paper] [note]

Multi-task learning

  • Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network [paper]
  • Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network, TASLP 2020 [paper] [note]

Semi-supervised learning

  • Adaptation of Multiple Sound Source Localization Neural Networks with Weak Supervision and Domain-adversarial Training, ICASSP 2019 [paper] [code]
  • Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation, TASLP 2021 [paper] [code] [note]

Other improvements

  • MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources, Interspeech 2022 [paper] [code]
  • Iterative Sound Source Localization for Unknown Number of Sources, Interspeech 2022 [paper] [code]

SSL+

SSL + Separation

  • Multi-Microphone Speaker Separation based on Deep DOA Estimation, EUSIPCO 2019 [paper] [note]
  • An End-to-End Deep Learning Framework For Multiple Audio Source Separation And Localization, ICASSP 2022 [paper] [note]
  • DBnet: Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation, ICASSP 2021 [paper]
  • Blind Speech Separation Through Direction of Arrival Estimation Using Deep Neural Networks with a Flexibility on the Number of Speakers, MMSP 2022 [paper]

Speech Enhancement + SSL

  • Time Difference of Arrival Estimation of Speech Signals Using Deep Neural Networks with Integrated Time-frequency Masking, ICASSP 2019 [paper] [note]
  • Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking, Interspeech 2019 [paper] [note]

SSL + Speaker Recognition

  • Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction, Interspeech 2021 [paper] [note]

Tools

Framework

Link Language Description
pyAudioAnalysis GitHub stars Python Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
Beamformer Python Implementation of the mask-based adaptive beamformer (MVDR, GEVD, MCWF).
Time-frequency Mask Python Computation of the time-frequency mask (PSM, IRM, IBM, IAM, ...) as the neural network training labels.
SSL Python Implementation of Sound Source Localization.
Data format Python Format tranform between Kaldi, Numpy and Matlab.

Audio feature extraction

Link Language Description
LPS Python Extract log-power-spectrum/magnitude spectrum/log-magnitude spectrum/Cepstral mean and variance normalization.
MFCC GitHub stars Python This library provides common speech features for ASR including MFCCs and filterbank energies.
pyAudioAnalysis GitHub stars Python Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
GCC & GCC-Fbank Python Python code to extract features: GCC coefficients and GCCFB.

Audio data augmentation

Link Language Description
Data simulation Python Add reverberation, noise or mix speaker.
RIR simulation Python Generation of the room impluse response (RIR) using image method.
pyroomacoustics GitHub stars Python Pyroomacoustics is a package for audio signal processing for indoor applications.
gpuRIR GitHub stars Python Python library for Room Impulse Response (RIR) simulation with GPU acceleration
rir_simulator_python GitHub stars Python Room impulse response simulator using python

Datasets

Sound source localization datasets (in no particular order)

Name Language Pricing Additional information
SSLR (2018) English Free A collection of real robot audio recordings for the development and evaluation of sound source localization methods.
LOCATA (2018) English Free
SLoClas (2021) English Free

Augmentation noise sources (sorted by usage frequency in paper)

Name Noise types Pricing Additional information
DEMAND (2013) 18 Free Diverse Environments Multichannel Acoustic Noise Database provides a set of recordings that allow testing of algorithms using real-world noise in a variety of settings.
115 Noise (2015) 115 Free The noise bank for simulate noisy data with clean speech. For N1-N100 noises, they were collected by Guoning Hu and the other 15 home-made noise types by USTC.
NoiseX-92 (1996) 15 Free Database of recording of various noises available on 2 CDROMs.

Learning materials

Book or thesis

  • Deep learning: method and applications, 2016 [link]
  • Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016 [link]

Video

  • CCF speech seminar 2020 [link]
  • Deep learning in speech by Hongyi Li, 2019 [link]

Slides

  • Deep learning in speech by Hongyi Li, 2019 [link]