/faceswap-GAN

A denoising autoencoder + adversarial loss for face swapping.

Primary LanguageJupyter Notebook

faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.

Descriptions

GAN-v1

GAN-v2

  • FaceSwap_GAN_v2_train.ipynb: Detailed training procedures can be found in this notebook.

    1. Build and train a GAN model.
    2. Use moviepy module to output a video clip with swapped face.
  • FaceSwap_GAN_v2_test_img.ipynb: Provides swap_face() function that require less VRAM.

    1. Load trained model.
    2. Do single image face swapping.
  • FaceSwap_GAN_v2_test_video.ipynb

    1. Load trained model.
    2. Use moviepy module to output a video clip with swapped face.
  • faceswap_WGAN-GP_keras_github.ipynb

    • This notebook contains a class of GAN mdoel using WGAN-GP.
    • Perceptual loss is discarded for simplicity.
    • The WGAN-GP model gave me similar result with LSGAN model after tantamount (~18k) generator updates.
    gan = FaceSwapGAN() # instantiate the class
    gan.train(max_iters=10e4, save_interval=500) # start training
  • FaceSwap_GAN_v2_sz128_train.ipynb

    • Input and output images have shape (128, 128, 3).
    • Minor updates on the architectures:
      1. Add instance normalization to generators and discriminators.
      2. Add additional regressoin loss (mae loss) on 64x64 branch output.
    • Not compatible with _test_video and _test_img notebooks above.

Others

  • dlib_video_face_detection.ipynb

    1. Detect/Crop faces in a video using dlib's cnn model.
    2. Pack cropped face images into a zip file.
  • Training data: Face images are supposed to be in ./faceA/ and ./faceB/ folder for each target respectively. Face images can be of any size. (Updated 3, Jan., 2018)

Results

In below are results that show trained models transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).

1. Autorecoder baseline

Autoencoder based on deepfakes' script. It should be mentoined that the result of autoencoder (AE) can be much better if we trained it for longer.

AE_results

2. Generative Adversarial Network, GAN (version 1)

Improved output quality: Adversarial loss improves reconstruction quality of generated images. In addition, when perceptual loss is apllied, the direction of eyeballs becomes more realistic and consistent with input face.

GAN_PL_results

VGGFace perceptual loss (PL): The following figure shows nuanced eyeballs direction of output faces trained with/without PL.

Comp PL

Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jittering on the swapped face. See the below gif for comparison.

bbox

  • A. Source face.
  • B. Swapped face, using smoothing mask (smoothes edges of output image when pasting it back to input image).
  • C. Swapped face, using smoothing mask and face alignment.
  • D. Swapped face, using smoothing mask and smoothed bounding box.

3. Generative Adversarial Network, GAN (version 2)

Version 1 features: Most of features in version 1 are inherited, including perceptual loss and smoothed bbox.

Segmentation mask prediction: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone.

mask0

mask1  mask2

  • Left: Source face.
  • Middle: Swapped face, before masking.
  • Right: Swapped face, after masking.

Mask visualization: The following gif shows output mask & face bounding box.

mask_vis

  • Left: Source face.
  • Middle: Swapped face, after masking.
  • Right: Mask heatmap & face bounding box.

Optional 128x128 input/output resolution: Increase input and output size to 128x128.

Mask refinement: Tips for mask refinement are provided in the jupyter notebooks (VGGFace ResNet50 is required). The following figure shows generated masks before/after refinement. Input faces are from CelebA dataset.

mask_refinement

Frequently asked questions

1. Video making is slow / OOM error?

  • It is likely due to too large resolution of input video, try to
    reduce input size
    def porcess_video(input_img):
      # Reszie to 1/2x width and height.
      input_img = cv2.resize(input_img, (input_img.shape[1]//2, input_img.shape[0]//2))
      image = input_image
      ...
    or disable CNN model for face detectoin
    def process_video(...):
      ...
      #faces = face_recognition.face_locations(image, model="cnn") # Use CNN model
      faces = face_recognition.face_locations(image) # Use default Haar features.  

2. How does it work?

  • This illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm.

3. No audio in output clips?

  • Set audio=True in the video making cell.
output = 'OUTPUT_VIDEO.mp4'
clip1 = VideoFileClip("INPUT_VIDEO.mp4")
clip = clip1.fl_image(process_video)
%time clip.write_videofile(output, audio=True) # Set audio=True

Requirements

Acknowledgments

Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and deepfakes. The generative network is adopted from CycleGAN. Part of illustrations are from irasutoya.