TASLP2022-AVRI

audio-visual tracking method

This repo is the official implementation of 'Audio-Visual Cross-Attention Network for Robotic Speaker Tracking', TASLP 2022.


You can download the constructed features from https://drive.google.com/drive/folders/1mLgvflJ2MKYz2WIZAx5H_XStYwSMc88I?usp=share_link to data/


To run the source code, simply conduct:

python hritrain.py -model [model name] -datapath [data path]


The raw data will be released soon.

(Due to personal privacy, the raw face images will not be released.)

Citation

@ARTICLE{qian2023avri,
  author={Qian, Xinyuan and Wang, Zhengdong and Wang, Jiadong and Guan, Guohui and Li, Haizhou},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Audio-Visual Cross-Attention Network for Robotic Speaker Tracking}, 
  year={2023},
  volume={31},
  number={},
  pages={550-562},
  doi={10.1109/TASLP.2022.3226330}}