Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

This repository contains the codebase accompanying the paper:

Julio Wissing, Benedikt Bönninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, "Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain", accepted for ICASSP 2021

The face tracking used to generate the input features utilizes the YOLOv3 algorithm. You can find the used yoloface repository here. Please install it before trying to generate input features for the spatial stream weighting.