code for the paper "An end-to-end binaural sound localization model based on the equalization and cancellation theory", which has been accepted by the 152 AES conference.
- python 3.7
- tensorflow-gpu 2.4
- gammatone: implementation of Gammatone filters
- BasicTools: a collection of scripts for some basic operations, such as plotting, reading and writing wave files.
- LocTools: scripts for processing log of localization.
- RoomSimulator: implementation of Image model for BRIR simulation.
- TIMIT1
- RealRoomBRIRs
cd syn_BRIRs
python syn_BRIRs.py
cd gen_dataset
python gen_train_valid_dataset.py
cd gen_dataset
python gen_test_1src_dataset.py
python gen_test_2src_dataset.py
python gen_test_3src_dataset.py
Three model scripts are provided,
- WaveLoc_EC: the basic model
python train-var_feat_filter_len-specify_cnn_f.py \
--model_script_path WaveLoc_EC.py \
--file_reader_script_path file_read.py \
--n_feat_filter 32 \ # channel number of CNN_feat kernel
--feat_filter_len 32 \ # length of CNN_feat kernel
--cnn_coef 1 \ # parameter for activation function of CNN_feat
--model_dir exp/WaveLoc_EC \
--train_file_list Data/train/reverb/record_paths.txt \
--valid_file_list Data/train/reverb/record_paths.txt \
--gpu_id 0
- WaveLoc-EC-wav_pool: add single max-pooling layer after normalization layer
python train-var_feat_filter_len-specify_cnn_f.py \
--model_script_path WaveLoc_EC.py \
--file_reader_script_path file_read.py \
--n_feat_filter 32 \ # channel number of CNN_feat kernel
--feat_filter_len 32 \ # length of CNN_feat kernel
--cnn_coef 1 \ # parameter for activation function of CNN_feat
--pool_len 2 \ # pooling size
--model_dir exp/WaveLoc_EC-wav_pool_2 \
--train_file_list Data/train/reverb/record_paths.txt \
--valid_file_list Data/train/reverb/record_paths.txt \
--gpu_id 0
- WaveLoc-EC-feat_pool: add signle max-pooling layer after the CNN layer which extract features from binaural signals
python train-var_feat_filter_len-specify_cnn_f.py \
--model_script_path WaveLoc_EC.py \
--file_reader_script_path file_read.py \
--n_feat_filter 32 \ # channel number of CNN_feat kernel
--feat_filter_len 32 \ # length of CNN_feat kernel
--cnn_coef 1 \ # parameter for activation function of CNN_feat
--pool_len 2 \ # pooling size
--model_dir exp/WaveLoc_EC-feat_pool_2 \
--train_file_list Data/train/reverb/record_paths.txt \
--valid_file_list Data/train/reverb/record_paths.txt \
--gpu_id 0
bash run-evaluate-realroom.sh --model_script WaveLoc_EC.py \
--model_dir exp/WaveLoc_EC \
--n_src 1 # number of sound source, 1~3
--chunksize 1 # number of successive frames of model output to be averaged
Footnotes
-
the suffix of audio files in TIMIT are ".WAV". Convert *.WAV to *.wav before generating any dataset. ↩