Visual Speech Enhancement

Implementation of the method described in the paper: Visual Speech Enhancement by Aviv Gabbay, Asaph Shamir and Shmuel Peleg.

Speech Enhancement Demo

Usage

Dependencies

python >= 2.7
mediaio
face-detection
keras >= 2.0.4
numpy >= 1.12.1
dlib >= 19.4.0
opencv >= 3.2.0
librosa >= 0.5.1

Getting started

Given an audio-visual dataset of the directory structure:

├── speaker-1
|   ├── audio
|   |   ├── f1.wav
|   |   └── f2.wav
|   └── video
|	├── f1.mp4
|	└── f2.mp4
├── speaker-2
|   ├── audio
|   |   ├── f1.wav
|   |   └── f2.wav
|   └── video
|	├── f1.mp4
|	└── f2.mp4
...

and noise directory contains audio files (*.wav) of noise samples, do the following steps.

Preprocess train, validation and test datasets separately by:

speech_enhancer.py --base_dir <output-dir-path> preprocess
    --data_name <preprocessed-data-name>
    --dataset_dir <dataset-dir-path>
    --noise_dirs <noise-dir-path> ...
    [--speakers <speaker-id> ...]
    [--ignored_speakers <speaker-id> ...]

Then, train the model by:

speech_enhancer.py --base_dir <output-dir-path> train
    --model <model-name>
    --train_data_names <preprocessed-training-data-name> ...
    --validation_data_names <preprocessed-validation-data-name> ...
    [--gpus <num-of-gpus>]

Finally, enhance the test noisy speech samples by:

speech_enhancer.py --base_dir <output-dir-path> predict
    --model <model-name>
    --data_name <preprocessed-test-data-name>
    [--gpus <num-of-gpus>]

Citing

If you find this project useful for your research, please cite