/WaveLoc

End-to-End binaural sound localization

Primary LanguagePython

End2End sound localization model

Reference:

P. Vecchiotti, N. Ma, S. Squartini, and G. J. Brown, “END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM,” in 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 345 E 47TH ST, NEW YORK, NY 10017 USA, 2019, pp. 451–455.

Only WaveLoc-GTF is implemented

Model

Training

Dataset

  • BRIR

    Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.

    Room A B C D
    RT_60(s) 0.32 0.47 0.68 0.89
    DDR(dB) 6.09 5.31 8.82 6.12
  • Sound source

    TIMIT database

    Sentences per azimuth

    Train Validate Evaluate
    24 6 15

Multi-conditional training(MCT)

For For each reverberant room, the rest 3 reverberant rooms and anechoic room are used for training

Training curves

Evaluation

Root mean square error(RMSE) is used as the metrics of performance. For each reverberant room, the evaluation was performed 3 times to get more stable results and the test dataset was regenerated each time.

Since binaural sound is directly fed to models without extra preprocess and there may be short pulses in speech, the localization result was reported based on chunks rather than frames. Each chunk consisted of 25 consecutive frames.

My result vs. paper

Reverberant room A B C D
My result 1.5 2.0 1.4 2.7
Result in paper 1.5 3.0 1.7 3.5

Main dependencies

  • tensorflow-1.14
  • pysofa (can be installed by pip)
  • BasicTools (in my other repository)