- Authors: Xudong Lin, Shiyuan Huang
- Email: xudong.lin@columbia.edu, shiyuanh15@gmail.com
- Our technical report is coming soon.
- In this project, we built a system which generates talking face from an audio.
- Watch our results here
- Our system consists of three modules: audio feature extractor, face generator, talking face generator.
- Matlab
- MatconvNet
- Download the dataset VoxCeleb: Audio, frames extracted at 1fps
- Find the pretrained model for feature extractor: emotion feature, identoity feature.
- run extract_identity_fc_voxceleb in matlab
- Note that this part is borrowed from this reimplementation of BEGAN.
- Many thanks to the authors. We did some modification to improve the performance and to use it as an audio-face translator.
- PyTorch
- torchvision
-
Download CelebA, choose the
Aligh&Croped Images
zip. Unzip it and put it underdata/
directory. -
Go into folder
Data
and runpython face_detect.py
, this script will detect and crop faces and store them underData/64_crop/
andData/128_crop
folder, this detecting and cropping script is from BEGAN-tensorflow -
Training
Train on 128x128 images
python began.py --cuda --outf 128/ --ndf 128 --ngf 128 --gamma 0.7 --loadSize 128 --fineSize 128 --dataPath Data/128_crop/ --res 0.5
- Generate images For example, use the model with residual loss at 40K ''' python generate.py --netG models/celeba_res.pth --outf imgs/celeba_res '''
- This will generate 12800 images in the outf folder. Do the same thing for model w\o residual loss.
- Go to here to find the codes for FID score computation.
- you may need to change the folders in dataloader depending on where you put your extracted audio features
- Training with identity features
python began_voxceleb_2.py --cuda --outf 128/ --ndf 128 --ngf 128 --gamma 0.7 --loadSize 128 --fineSize 128 --dataPath $where you put the images$ --res 0.5 --metric 0.5
- Training with emotion features
python began_voxceleb_e.py --cuda --outf 128/ --ndf 128 --ngf 128 --gamma 0.7 --loadSize 128 --fineSize 128 --dataPath $where you put the images$ --res 0.5 --metric 0.5 --nz 56
- Matlab
- MatconvNet
- Now you have audio and image generated from it, go to You said that to find the demo for video synthesis.
- Thanks for all the aforementioned previouis works! We will fix the liscense issue, if there is one.