Wav2lip in Vector Quantized space

an unofficial implementation of Towards Generating Ultra-High Resolution Talking-Face Videos with Lip synchronization
  • VQGAN
  • syncnet_vq.py
    • face_encoder: (B, T x 256, 16, 16) -> (B, 512, 1, 1)
    • audio_encoder: (B, 1, 80, 16) -> (B, 512, 1, 1)
  • color_syncnet_train_vq.py
    • vqgan config / ckpt

Reference