matthijsvk/multimodalSR

Multimodal speech recognition using lipreading (with CNNs) and audio (using LSTMs). Sensor fusion is done with an attention network.

Jupyter NotebookMIT