/avsec_preprocessing

Scripts for preprocessing COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSEC)

Primary LanguagePythonOtherNOASSERTION

Scripts for preprocessing audio-visual speech enhancement challenge (AVSEC) data

This script can be used to extract the following features

  • FaceMesh landmarks [1]
  • lip images using landmark
  • face embeddings using FaceNet [2]
  • lip embeddings using TCN [3]

Requirements

## CPU 
pip install -r requirements.txt

## GPU
pip install -r requirements_gpu.txt

## Apple Silicon
pip install -r requirements_mac.txt

Usage

python main.py --data-dir ./data/train/scenes \
               --save-dir ./preprocessed/train \
               --models-root ./models \
               --all-feat

References