/Virtual-Tryon-Project

A project about Virtual Try-On. Lines of code ~5,200.

Primary LanguagePython

Virtual Try-on project

@Author: Alex Nguyen | Gettysburg College

This repo is the composition of multiple applications that can derive the generation of try-on image given the input image of the person and the image of the clothing.

Experiment

Approach

  1. According to the VITON paper, the process of producing a try-on image is motivated by a self-supervised learning approach, and the pipeline can be described as below:

    • Firstly, the person images are extracted into a set of image channels called Person Representation. According to the paper, a person representation is clothing-agnostic, which contains a set of features (see image), including pose, body parts, face and hair, as a prior to constrain the synthesis process.

    img

    • These feature, are then concatnated with the clothing image, going through a encoder-decoder like network with skip connections like U-NET. And predict the actual try-on image based on perceptual loss with the original person with clothing ground truth.
  2. To achieve the pose detection feature, mediapipe from Google was taken into advantage. To test the pose detection, please refer to the pose doc on the library's page, and navigate to pose_notebook.py to test on your webcam. Note that the functionality has also been imported to the main pipeline in notebook.py

  3. To achieve human parsing feature, a fully connected convolutional auto-encoder with U-Net architecture was implemented using pretrained model MobileNetV2 that makes use of 'imagenet' weights and deconvolutional (upsampling) layers taken from pix2pix tensorflow examples. The idea is also taken from the (human parsing papers).

    • Here is the loss and accuracy achieved by the U-Net with MobileNetV2, along with the results:

Sample pose and binary human parsing. I.e: Pixel is either 1 (human) or 0 (background)


For input:
Human body parts parsing

Human body parts parsing Collapse

Full input to the VITON Pipeline


  1. Results of the first phase VITON networks:
  1. Shape Context Matching and TPS Wraping:

Installation

  • This repo is in initial state that only contains jupyter notebook version of the code (not the full pipeline library/api). To experiment and execute the code cell in each notebook.py file, please install the following:

  • Here are some more datasets that you can look over:

Reference

  1. Try-on paper: This paper have the whole basic pipeline, although it's not the state-of-the-art pipeline, it serves as the fundamental baseline.
@inproceedings{8578885,
  author={X. {Han} and Z. {Wu} and Z. {Wu} and R. {Yu} and L. S. {Davis}},
  booktitle={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition}, 
  title={VITON: An Image-Based Virtual Try-on Network}, 
  year={2018},
  volume={},
  number={},
  pages={7543-7552},
  doi={10.1109/CVPR.2018.00787}}
  1. Human Parsing paper: This paper experiments and compares approaches of human parsing task.
@inproceedings{_,
    author = {Gong, Ke and Liang, Xiaodan and Zhang, Dongyu and Shen, Xiaohui and Lin, Liang},
    year = {2017},
    month = {07},
    pages = {6757-6765},
    title = {Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing},
    doi = {10.1109/CVPR.2017.715}
}
  1. FCN network in human parsing.
@inproceedings{7298965,  
    author={J. {Long} and E. {Shelhamer} and T. {Darrell}},  
    booktitle={2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},   
    title={Fully convolutional networks for semantic segmentation},   
    year={2015},  
    volume={},  
    number={},  
    pages={3431-3440},  
    doi={10.1109/CVPR.2015.7298965}}