AudioVisual Speaker Localization
A framework for localizing speakers on video streams by means of deep learning topologies.
This repo includes implementations for processing video streams, localizing faces, mouths and implements a fast visual voice activity detector. Please visit the following repos for implementing the ETi voice activity detctor and the visual voice activity detctor by Conv2DLSTM:
http://github.com/lvrysis/Audio-DNN-Classification
http://github.com/lvrysis/Audio-Feature-Integration
The implementations are powered by Python.
You can experiment using the M3C Speaker Localization datasets:
http://research.playcompass.com/files/M3C-Speaker-Localization-1.zip