Deep learning network to transfer body movements from one individual to another. Eg mimic dance steps. First of all, the project has not produced very promising results, all I intend to do is to give you a direction to think. I method used here can be adopted in many situations For the motivation, let me walk you through some of the priliminary results.
Going into details of the project, I'll start with the basic idea. You just cant train a deep learning network from a given man to superman, because then the solution will be specific to that particular man. So you need something that is same for all people to stand as an intermediate layer. What I used is the pose estimation technique as given in the paper Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields.
I hence trained a conditional gan to map pose images to superman. Once the gan is trainied, all you need to do is take a video, convert it into frames,then the frames to poses(stick images :P) and give it to the generator. Make a video from the generated frames and you are done. Sounds very easy right, yeah it is.
-
Download the dataset and preprocess: In our case download a video of your wish. Make sure the video has the same backgroud throughout and full body of the same person. Once this is done preprocess the video to convert it into frames. These frames are the ground truth data. These frames should now be converted to its corresponding poses. Now the frames and corresponding poses should be stitched together.
-
Once this is done you are ready to train your pix2pix network to find a mapping from poses to groundtruth.
python train.py --dataroot ./datasets/dance_mimic --name dance_mimic_pix2pix --model pix2pix --which_model_netG unet_256
-
To view training results and loss plots, run
python -m visdom.server
and click the URL http://localhost:8096. To see more intermediate results, check out./checkpoints/dance_mimic_pix2pix/web/index.html
-
Model will be saved at ./checkpoints/dance_mimic_pix2pix/
-
Download the dataset and preprocess as given above in train. You can make the images aligned(pose frame and actaul frame) or as a single image also. Its better to keep it aligned to visualize how good the results are.
-
Once this is done you are ready to test your pix2pix network. For aligned frames.
python test.py --dataroot ./datasets/dance_mimic --name dance_mimic_pix2pix --model pix2pix --which_model_netG unet_256 --which_direction BtoA --dataset_mode aligned --norm batch
For single images
python test.py --dataroot ./datasets/dance_mimic --name dance_mimic_pix2pix --model pix2pix --which_model_netG unet_256 --which_direction BtoA --dataset_mode single --norm batch
The test results will be saved here: ./results/dance_mimic_pix2pix/latest_val/images
.
- From here frames can be joined together to make a complete video. i_fake_B.png has the generated frames. Use take_frames.py to take images and store it in a different folder. There you can run the below command on the terminal to convert frames to video.
ffmpeg -r 60 -f image2 -s 256x256 -i %d.png -vcodec libx264 -crf 25 -pix_fmt yuv420p spiderman.mp4
- To make two videos side by side use.(Make sure you change the file names appropriately)
ffmpeg -i man.mp4 -i spiderman.mp4 -filter_complex '[0:v]pad=iw*2:ih[int];[int][1:v]overlay=W/2:0[vid]' -map [vid] -c:v libx264 -crf 23 -preset veryfast output.mp4
- In case you want gif of the video use
ffmpeg -i output.mp4 -pix_fmt rgb24 output.gif
For testing real time on your webcam
python web_spiderman.py
-
For 256x256 size trained pix2pix model for spiderman video is at https://drive.google.com/open?id=1GZaD1JDuD4OD6mDqJPIP1NfT_f_nAqdm Put the above folder inside the folder named checkpoints in the current directory. The folder dance_mimic, has trained pytorch model for every 5 epochs.
-
For the pose estimation the model is at https://drive.google.com/file/d/1Yv7DVeLqWMJPGic3snWTwZ1Cn5TGW84t/view?usp=sharing Put the above file inside the folder named 'model'(not models).
-
Pretrained model for HD video will be made available soon.