Audio-visual-sound-localization

source code for the ICASSP2021 paper：“Multi-target DoA Estimation with an Audio-visual Fusion Mechanism”

Clone this repository

git clone https://github.com/catherine-qian/Audio-visual-sound-localization.git
Download the extracted features (feature extraction source code) from

https://drive.google.com/drive/folders/1wDa3MNqVcYJ76uV2SQR1ZsaOzQ7fpDo_?usp=share_link

and put the features under data/

(you may specify the datapath in dataread.py)
Run the following command to get the results

python main_sslr.py -model 'MLP3'

If you use this code

please cite:

@inproceedings{qian2021multi, title={Multi-target DoA Estimation with an Audio-visual Fusion Mechanism}, author={Qian, Xinyuan and Madhavi, Maulik and Pan, Zexu and Wang, Jiadong and Li, Haizhou}, booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={4280--4284}, year={2021}, organization={IEEE} }

catherine-qian/Audio-visual-sound-localization

Audio-visual-sound-localization