Codes for final project of COMS 4731 Computer Vision

Authors: Xudong Lin, Shiyuan Huang
Email: xudong.lin@columbia.edu, shiyuanh15@gmail.com
Our technical report is coming soon.

Audio-conditioned talking face generation

In this project, we built a system which generates talking face from an audio.
Watch our results here
Our system consists of three modules: audio feature extractor, face generator, talking face generator.

Audio feature extractor

Prerequisites

Matlab
MatconvNet

Instructions

Download the dataset VoxCeleb: Audio, frames extracted at 1fps
Find the pretrained model for feature extractor: emotion feature, identoity feature.
run extract_identity_fc_voxceleb in matlab

Face generator

Note that this part is borrowed from this reimplementation of BEGAN.
Many thanks to the authors. We did some modification to improve the performance and to use it as an audio-face translator.

Prerequisites

PyTorch
torchvision

Training on CelebaA

Download CelebA, choose the Aligh&Croped Images zip. Unzip it and put it under data/ directory.
Go into folder Data and run python face_detect.py, this script will detect and crop faces and store them under Data/64_crop/ and Data/128_crop folder, this detecting and cropping script is from BEGAN-tensorflow

Training

Train on 128x128 images

python began.py --cuda --outf 128/ --ndf 128 --ngf 128 --gamma 0.7 --loadSize 128 --fineSize 128 --dataPath Data/128_crop/ --res 0.5

Reproduce the FID score

Generate images For example, use the model with residual loss at 40K ''' python generate.py --netG models/celeba_res.pth --outf imgs/celeba_res '''
This will generate 12800 images in the outf folder. Do the same thing for model w\o residual loss.
Go to here to find the codes for FID score computation.

Training on Voxceleb emotion features

you may need to change the folders in dataloader depending on where you put your extracted audio features
Training with identity features

  python began_voxceleb_2.py --cuda --outf 128/ --ndf 128 --ngf 128 --gamma 0.7 --loadSize 128 --fineSize 128 --dataPath $where you put the images$ --res 0.5 --metric 0.5

Training with emotion features

  python began_voxceleb_e.py --cuda --outf 128/ --ndf 128 --ngf 128 --gamma 0.7 --loadSize 128 --fineSize 128 --dataPath $where you put the images$ --res 0.5 --metric 0.5 --nz 56

Talking face generator

Prerequisites

Matlab
MatconvNet

Instructions

Now you have audio and image generated from it, go to You said that to find the demo for video synthesis.

Acknowledgement

Thanks for all the aforementioned previouis works! We will fix the liscense issue, if there is one.

XudongLinthu/audio-face-generation

Codes for final project of COMS 4731 Computer Vision

Audio-conditioned talking face generation

Audio feature extractor

Prerequisites

Instructions

Face generator

Prerequisites

Training on CelebaA

Reproduce the FID score

Training on Voxceleb emotion features

Talking face generator

Prerequisites

Instructions

Acknowledgement