Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.
Date | Update |
---|---|
2018-03-03 | Model architecture: Add a new notebook which contains an improved GAN architecture. The architecture is greatly inspired by XGAN and MS-D neural network. |
2018-02-13 | Video conversion: Add a new video procesisng script using MTCNN for face detection. Faster detection with configurable threshold value. No need of CUDA supported dlib. (New notebook: v2_test_vodeo_MTCNN) |
2018-02-10 | Video conversion: Add a optional (default False ) histogram matching function for color correction into video conversion pipeline. Set use_color_correction = True to enable this feature. (Updated notebooks: v2_sz128_train, v2_train, and v2_test_video) |
-
- Script for training the version 1 GAN model.
- Video conversion functions are also included.
-
FaceSwap_GAN_v2_train.ipynb (recommneded for trainnig)
- Script for training the version 2 GAN model.
- Video conversion functions are also included.
-
FaceSwap_GAN_v2_test_video.ipynb
- Script for generating videos.
- Using face_recognition module for face detection.
-
FaceSwap_GAN_v2_test_video_MTCNN.ipynb (recommneded for video conversion)
- Script for generating videos.
- Using MTCNN for face detection. Does not reqiure CUDA supported dlib.
-
faceswap_WGAN-GP_keras_github.ipynb
- This notebook contains a class of GAN mdoel using WGAN-GP.
- Perceptual loss is discarded for simplicity.
- The WGAN-GP model gave me similar result with LSGAN model after tantamount (~18k) generator updates.
gan = FaceSwapGAN() # instantiate the class gan.train(max_iters=10e4, save_interval=500) # start training
-
FaceSwap_GAN_v2_sz128_train.ipynb
- Input and output images have larger shape
(128, 128, 3)
. - Minor updates on the architectures:
- Add instance normalization to generators and discriminators.
- Add additional regressoin loss (mae loss) on 64x64 branch output.
- Not compatible with
_test_video
and_test_video_MTCNN
notebooks above.
- Input and output images have larger shape
-
dlib_video_face_detection.ipynb
- Detect/Crop faces in a video using dlib's cnn model.
- Pack cropped face images into a zip file.
-
Training data: Face images are supposed to be in
./faceA/
and./faceB/
folder for each target respectively. Face images can be of any size.
In below are results that show trained models transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).
Source video: 佐野ひなことすごくどうでもいい話?(遊戯王)
Autoencoder based on deepfakes' script. It should be mentoined that the result of autoencoder (AE) can be much better if we train it longer.
-
Improved output quality: Adversarial loss improves reconstruction quality of generated images.
-
VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face.
-
Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jitter on the swapped face.
-
Version 1 features: Most of the features in version 1 are inherited, including perceptual loss and smoothed bbox.
-
Unsupervised segmentation mask: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone.
- From left to right: source face, swapped face (before masking), swapped face (after masking).
- From left to right: source face, swapped face (after masking), mask heatmap.
-
Optional 128x128 input/output resolution: Increase input and output size from 64x64 to 128x128.
-
Mask refinement: VGGFace ResNet50 is introduced for mask refinement (as the preceptual loss). The following figure shows generated masks before/after refinement. Input faces are from CelebA dataset.
-
Mask comparison: The following figure shows comparison between (i) generated masks and (ii) face segmentations using YuvalNirkin's FCN netwrok. Surprisingly, FCN sometimes fails to segment out face occlusions (see the 2nd and 4th rows).
-
Face detection/tracking using MTCNN and Kalman filter: More stable detection and smooth tracking.
-
V2.1 update: An improved architecture is updated in order to stablize training. The architecture is greatly inspired by XGAN and MS-D neural network.
- In v2.1 architecture, we add more discriminators/losses to the GAN. To be specific, they are:
- GAN loss for non-masked outputs: Add two more discriminators to non-masked outputs.
- Semantic consistency loss (XGAN): Use cosine distance of embeddings of real faces and reconstructed faces.
- Domain adversarial loss (XGAN): Encourage embeddings to lie in the same subspace.
- (WIP) Frame loss: An L1 loss between output of current frame and previous frame, resulting smooth transition in output video.
- One
res_block
in the decoder is replaced by MS-D network (default depth = 16) for output refinement.- This is a very inefficient implementation of MS-D network.
- Preview images are saved in
./previews
folder. - (WIP) Random motion blur as data augmentation, reducing ghost effect in output video.
- FCN8s for face segmentation is introduced to improve masking in video conversion (default
use_FCN_mask = False
).- To enable this feature, keras weights file should be generated through jupyter notebook provided in this repo.
- In v2.1 architecture, we add more discriminators/losses to the GAN. To be specific, they are:
- It is likely due to too high resolution of input video, modify the parameters in step 13 or 14 will solve it.
- First, increase
video_scaling_offset = 0
to 1 or higher. - If it doesn't help, set
manually_downscale = True
. - If the above still do not help, disable CNN model for face detectoin.
def process_video(...): ... #faces = get_faces_bbox(image, model="cnn") # Use CNN model faces = get_faces_bbox(image, model='hog') # Use default Haar features.
- First, increase
- This illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this.
- Set
audio=True
in the video making cell.output = 'OUTPUT_VIDEO.mp4' clip1 = VideoFileClip("INPUT_VIDEO.mp4") clip = clip1.fl_image(process_video) %time clip.write_videofile(output, audio=True) # Set audio=True
- Default setting transfroms face B to face A.
- To transform face A to face B, modify the following parameters depending on your current running notebook:
- Change
path_abgr_A
topath_abgr_B
inprocess_video()
(step 13/14 of v2_train.ipynb and v2_sz128_train.ipynb). - Change
whom2whom = "BtoA"
towhom2whom = "AtoB"
(step 12 of v2_test_video.ipynb).
- Change
- keras 2
- Tensorflow 1.3
- Python 3
- OpenCV
- moviepy
- dlib (optional)
- face_recognition (optinoal)
Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and deepfakes. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.