AudioVisual Speaker Localization

A framework for localizing speakers on video streams by means of deep learning topologies.

This repo includes implementations for processing video streams, localizing faces, mouths and implements a fast visual voice activity detector. Please visit the following repos for implementing the ETi voice activity detctor and the visual voice activity detctor by Conv2DLSTM:
http://github.com/lvrysis/Audio-DNN-Classification
http://github.com/lvrysis/Audio-Feature-Integration

The implementations are powered by Python.

lvrysis/AudioVisual-Speaker-Localization