/visual-speech-separation

Flask app to demo multimodal deep learning speech separation in videos via TensorFlow Serving

Primary LanguagePythonApache License 2.0Apache-2.0

About Basis

Basis is a proof-of-concept web app which provides an interactive demonstration of separating on/off-screen audio sources for a given video.

It leverages the speech separation model created by Andrew Owens et al. used for separating on / off-screen audio sources. This project is based upon open-source code and models licensed under the Apache License 2.0.

I built this as an opportunity to learn:

  • Implementation details of a "legacy" TensorFlow 1.x model
  • How to freeze, inspect, and host a model using TF-Serving
  • How to perform real-time inferencing on video from a public web app