/alterate_affection

Uses latent spaces of images to change the affect of a particular video

Primary LanguageJupyter NotebookOtherNOASSERTION

alterate_affection — a repurpose of stylegan-encoder

This repository tries to use Puzer/stylegan-encoder to change the affection of a video.

Setup:

Note: You can see most (if not all) of this instructions here

  1. Using a docker
  2. Go to the folder of your choice and clone this repository and cd into it: git clone https://github.com/ralcant/alterate_affection.git
  3. [Optional — but highly recommended] Working with Python virtual enviroments:
  4. Start you Jupyter Notebook in remote server
  5. In your local computer, create a ssh tunnel
  6. Access the jupyter notebook locally and start working on the general_video_processing notebook! If you are using a virtual enviroment, Make sure to choose it as the kernel of your notebook before running anything.

Working with a video:

Now let the fun begin!

After the setup, we can work with the general_video_processing notebook The notebook should be self explanatory, but in a general sense this is what happens:

  1. Getting everything ready : Here we install the necessary packages, and create any folder we might later need.
  2. Breaking the video into multiple frames : Here we take a video and split it in multiple frames using cv2. We also store the fps (frames per second) of the video, which will be useful later.
  3. Updating every frame : This is the heavy and most time consuming part of the whole notebook. The main goal is to update every frame with the person's emotion changed. This is where the stylegan-encoder code will be most useful. We divide the full work in subsections:
    • 2.1: Getting the aligned images out of every frame : Here we use a modified version of stylegan-encoder's align_images.py code. Here we store, for every frame, the positions of the face in a variable called ALL_ALIGNED_INFO. This will be useful later.
    • 2.2: Generating the latent vectors from the aligned images : This is by far the step that takes the most time (we are talking about hours). It uses stylegan-encoder's encode_images.py. The latent vectors will be useful to change the affect in every frame (see next step)
    • 2.3: Changing the affect of the aligned frames, and use this to change the affect of the original frames: We use the latent vectors from 2.2 and stylegan-encoder's smile_direction to change the emotion of every aligned frame. Then we use the values from ALL_ALIGNED_INFO and some image processing to put that face into our original frame.
  4. Combining the processed frames into a video: We use cv2 for this. The output will be video with no sound of the updated frames. For this to work we use the fps we found in Step #1
  5. Extracting the audio from the original video: We use moviepy for this and we store the mp3 audio of our original video.
  6. Adding the audio to our processed video: Final step! We use moviepy for this too.

Phew! That was quite a lot.

Current limitations

  • Still need to see a way to not harcode the face dimension
  • Adding the original audio does not seem to be a good idea, as now the lips of the transformed frames are not in sync with the sound

You can see the original readme here