GCC-PHAT_DNN_Loc: A Python repository from YangangCao

GCC-PHAT based DNN localization method

Baseline system in END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM¹

Framework

Dataset

Binaural signal are synthesized using BRIRs.

BRIRs

Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.

Room A B C D

RT_60(s) 0.32 0.47 0.68 0.89

DDR(dB) 6.09 5.31 8.82 6.12
Sound source

TIMIT sentences

Sentences per azimuth

Train Validate Evaluate

24 6 15

Cue extractor

Normally, features are normalized before being fed into network. If each dimension of features is independent variable, then normalization is applied to each dimension separately. For GCC-PHAT, what matters is the peak position, in other words, the relative value of each dimension, the same normalization coefficient should be used.

Two types of normalization are tested here:

separate_norm: each dimension is normalized separately
overall_norm: all dimensions are normalized with the same factor

E.g.

separate_norm	overall_norm

Model training

Multi-conditional training(MCT)

Each time, 1 reverberant room was selected and using in evaluation, the other 3 reverberant rooms and the anechoic room were used in model training.

Evaluation

Localization result was reported every 25 frames, considering the existence of silent frames. The RMSE of sound azimuth is used as performance metrics. For more stable result, evaluation is ran on 4 different test sets and RMSEs are averaged (not in the ref. paper).

	A	B	C	D
Paper	2.7	3.3	3.1	5.2
Separate_norm	0.5	1.6	1.1	3.3
overall_norm	0.6	1.7	1.1	3.3

Stability of model training

For room D, model is trained 3 times. Even though similar losses are achieved, test results vary.

mean: 3.39 std: 0.07

Main Dependencies

python 3
tensorflow-1.14
pysofa https://github.com/bingo-todd/pySOFA
BasicTools https://github.com/bingo-todd/BasicTools

Generate dataset

Align BRIRs(Not necessary)

Align BRIRs of reverberant rooms to BRIRs of anechoic room.
Synthesize spatial recordings
Calculate GCC-PHAT features
Calculate normalization coefficients of GCC-PHAT features

Reference

Vecchiotti, Paolo, Ning Ma, Stefano Squartini, and Guy J. Brown. “END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM.” In 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 451–55. International Conference on Acoustics Speech and Signal Processing ICASSP. 345 E 47TH ST, NEW YORK, NY 10017 USA: IEEE, 2019. ↩

Room	A	B	C	D
RT_60(s)	0.32	0.47	0.68	0.89
DDR(dB)	6.09	5.31	8.82	6.12

Train	Validate	Evaluate
24	6	15

YangangCao/GCC-PHAT_DNN_Loc