Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.
-
- Build and train a GAN model.
- Use moviepy module to output a video clip with swapped face.
-
FaceSwap_GAN_v2_train.ipynb: Detailed training procedures can be found in this notebook.
- Build and train a GAN model.
- Use moviepy module to output a video clip with swapped face.
-
FaceSwap_GAN_v2_test_img.ipynb: Provides
swap_face()
function that require less VRAM.- Load trained model.
- Do single image face swapping.
-
FaceSwap_GAN_v2_test_video.ipynb
- Load trained model.
- Use moviepy module to output a video clip with swapped face.
-
faceswap_WGAN-GP_keras_github.ipynb
- This notebook contains a class of GAN mdoel using WGAN-GP.
- Perceptual loss is discarded for simplicity.
- The WGAN-GP model gave me similar result with LSGAN model after tantamount (~18k) generator updates.
gan = FaceSwapGAN() # instantiate the class gan.train(max_iters=10e4, save_interval=500) # start training
-
FaceSwap_GAN_v2_sz128_train.ipynb
- Input and output images have shape
(128, 128, 3)
. - Minor updates on the architectures:
- Add instance normalization to generators and discriminators.
- Add additional regressoin loss (mae loss) on 64x64 branch output.
- Not compatible with
_test_video
and_test_img
notebooks above.
- Input and output images have shape
-
dlib_video_face_detection.ipynb
- Detect/Crop faces in a video using dlib's cnn model.
- Pack cropped face images into a zip file.
-
Training data: Face images are supposed to be in
./faceA/
and./faceB/
folder for each target respectively. Face images can be of any size. (Updated 3, Jan., 2018)
In below are results that show trained models transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).
Source video: 佐野ひなことすごくどうでもいい話?(遊戯王)
Autoencoder based on deepfakes' script. It should be mentoined that the result of autoencoder (AE) can be much better if we trained it for longer.
Improved output quality: Adversarial loss improves reconstruction quality of generated images. In addition, when perceptual loss is apllied, the direction of eyeballs becomes more realistic and consistent with input face.
VGGFace perceptual loss (PL): The following figure shows nuanced eyeballs direction of output faces trained with/without PL.
Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jittering on the swapped face. See the below gif for comparison.
- A. Source face.
- B. Swapped face, using smoothing mask (smoothes edges of output image when pasting it back to input image).
- C. Swapped face, using smoothing mask and face alignment.
- D. Swapped face, using smoothing mask and smoothed bounding box.
Version 1 features: Most of features in version 1 are inherited, including perceptual loss and smoothed bbox.
Segmentation mask prediction: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone.
- Left: Source face.
- Middle: Swapped face, before masking.
- Right: Swapped face, after masking.
Mask visualization: The following gif shows output mask & face bounding box.
- Left: Source face.
- Middle: Swapped face, after masking.
- Right: Mask heatmap & face bounding box.
Optional 128x128 input/output resolution: Increase input and output size to 128x128.
Mask refinement: Tips for mask refinement are provided in the jupyter notebooks (VGGFace ResNet50 is required). The following figure shows generated masks before/after refinement. Input faces are from CelebA dataset.
- It is likely due to too large resolution of input video, try to
reduce input sizeor disable CNN model for face detectoindef porcess_video(input_img): # Reszie to 1/2x width and height. input_img = cv2.resize(input_img, (input_img.shape[1]//2, input_img.shape[0]//2)) image = input_image ...
def process_video(...): ... #faces = face_recognition.face_locations(image, model="cnn") # Use CNN model faces = face_recognition.face_locations(image) # Use default Haar features.
- This illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm.
- Set
audio=True
in the video making cell.
output = 'OUTPUT_VIDEO.mp4'
clip1 = VideoFileClip("INPUT_VIDEO.mp4")
clip = clip1.fl_image(process_video)
%time clip.write_videofile(output, audio=True) # Set audio=True
- keras 2
- Tensorflow 1.3
- Python 3
- OpenCV
- dlib
- face_recognition
- moviepy
Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and deepfakes. The generative network is adopted from CycleGAN. Part of illustrations are from irasutoya.