Audio-visual-sound-localization

source code for the ICASSP2021 paper:“Multi-target DoA Estimation with an Audio-visual Fusion Mechanism”

  1. Clone this repository

    git clone https://github.com/catherine-qian/Audio-visual-sound-localization.git

  2. Download the extracted features (feature extraction source code) from

    https://drive.google.com/drive/folders/1wDa3MNqVcYJ76uV2SQR1ZsaOzQ7fpDo_?usp=share_link

    and put the features under data/

    (you may specify the datapath in dataread.py)

  3. Run the following command to get the results

    python main_sslr.py -model 'MLP3'


If you use this code

please cite:

@inproceedings{qian2021multi, title={Multi-target DoA Estimation with an Audio-visual Fusion Mechanism}, author={Qian, Xinyuan and Madhavi, Maulik and Pan, Zexu and Wang, Jiadong and Li, Haizhou}, booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={4280--4284}, year={2021}, organization={IEEE} }