/GCC-PHAT_DNN_Loc

DNN based binaural sound localization model, using GCC-PHAT as features

Primary LanguagePython

GCC-PHAT based DNN localization method

Baseline system in END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM1

Framework

Dataset

Binaural signal are synthesized using BRIRs.

  • BRIRs

    Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.

    Room A B C D
    RT_60(s) 0.32 0.47 0.68 0.89
    DDR(dB) 6.09 5.31 8.82 6.12
  • Sound source

    TIMIT sentences

    Sentences per azimuth

    Train Validate Evaluate
    24 6 15

Cue extractor

Normally, features are normalized before being fed into network. If each dimension of features is independent variable, then normalization is applied to each dimension separately. For GCC-PHAT, what matters is the peak position, in other words, the relative value of each dimension, the same normalization coefficient should be used.

Two types of normalization are tested here:

  • separate_norm: each dimension is normalized separately
  • overall_norm: all dimensions are normalized with the same factor

E.g.

separate_norm overall_norm

Model training

Multi-conditional training(MCT)

Each time, 1 reverberant room was selected and using in evaluation, the other 3 reverberant rooms and the anechoic room were used in model training.

Evaluation

Localization result was reported every 25 frames, considering the existence of silent frames. The RMSE of sound azimuth is used as performance metrics. For more stable result, evaluation is ran on 4 different test sets and RMSEs are averaged (not in the ref. paper).

A B C D
Paper 2.73.33.15.2
Separate_norm0.51.61.13.3
overall_norm0.61.71.13.3

Stability of model training

For room D, model is trained 3 times. Even though similar losses are achieved, test results vary.

mean: 3.39 std: 0.07

Main Dependencies

Generate dataset

  1. Align BRIRs(Not necessary)

    Align BRIRs of reverberant rooms to BRIRs of anechoic room.

  2. Synthesize spatial recordings

  3. Calculate GCC-PHAT features

  4. Calculate normalization coefficients of GCC-PHAT features

Reference

Footnotes

  1. Vecchiotti, Paolo, Ning Ma, Stefano Squartini, and Guy J. Brown. “END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM.” In 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 451–55. International Conference on Acoustics Speech and Signal Processing ICASSP. 345 E 47TH ST, NEW YORK, NY 10017 USA: IEEE, 2019.