alterate_affection — a repurpose of stylegan-encoder
This repository tries to use Puzer/stylegan-encoder
to change
the affection of a video.
Note: You can see most (if not all) of this instructions here
- Using a docker
- shh into your remote server
- Create a docker container and expose a port: We will be using Jupyter Notebook, and therefore we need to expose a port from our docker container
- Once created, enter as root
- If this is the first time using this docker, install some generally useful packages and create your user
- Once you are in root, run the commands in [./preinstall.txt]
- Enter as yourself. This will take you to
/home/ralcanta
. Replaceralcanta
with your username. cd /u/ralcanta/
- Go to the folder of your choice and clone this repository and
cd
into it:git clone https://github.com/ralcant/alterate_affection.git
- [Optional — but highly recommended] Working with Python virtual enviroments:
- Start you Jupyter Notebook in remote server
- In your local computer, create a ssh tunnel
- Access the jupyter notebook locally and start working on the general_video_processing notebook! If you are using a virtual enviroment, Make sure to choose it as the kernel of your notebook before running anything.
Now let the fun begin!
After the setup, we can work with the general_video_processing notebook The notebook should be self explanatory, but in a general sense this is what happens:
- Getting everything ready : Here we install the necessary packages, and create any folder we might later need.
- Breaking the video into multiple frames : Here we take a video and split it in multiple frames using cv2. We also store the fps (frames per second) of the video, which will be useful later.
- Updating every frame : This is the heavy and most time consuming part of the whole notebook. The main goal is to update every frame with the person's emotion changed. This is where the
stylegan-encoder
code will be most useful. We divide the full work in subsections:- 2.1: Getting the aligned images out of every frame : Here we use a modified version of stylegan-encoder's
align_images.py
code. Here we store, for every frame, the positions of the face in a variable calledALL_ALIGNED_INFO
. This will be useful later. - 2.2: Generating the latent vectors from the aligned images : This is by far the step that takes the most time (we are talking about hours). It uses stylegan-encoder's
encode_images.py
. The latent vectors will be useful to change theaffect
in every frame (see next step) - 2.3: Changing the affect of the aligned frames, and use this to change the affect of the original frames: We use the latent vectors from
2.2
and stylegan-encoder'ssmile_direction
to change the emotion of every aligned frame. Then we use the values fromALL_ALIGNED_INFO
and some image processing to put that face into our original frame.
- 2.1: Getting the aligned images out of every frame : Here we use a modified version of stylegan-encoder's
- Combining the processed frames into a video: We use cv2 for this. The output will be video with no sound of the updated frames. For this to work we use the fps we found in Step #1
- Extracting the audio from the original video: We use moviepy for this and we store the mp3 audio of our original video.
- Adding the audio to our processed video: Final step! We use moviepy for this too.
Phew! That was quite a lot.
- Still need to see a way to not harcode the face dimension
- Adding the original audio does not seem to be a good idea, as now the lips of the transformed frames are not in sync with the sound
You can see the original readme here