/deepfeatinterp

Deep Feature Interpolation (CVPR 2017)

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

1 Introduction

Deep Feature Interpolation (DFI) edits the content of an image by interpolating the feature representations of a deep convolutional neural network. DFI is described in Deep Feature Interpolation for Image Content Changes. Project website.

Please cite this paper if you use our work:

Paul Upchurch1, Jacob Gardner1, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, Kilian Weinberger. Deep Feature Interpolation for Image Content Changes. In Computer Vision and Pattern Recognition (CVPR), 2017

1Authors contributed equally.

bibtex
@inproceedings{upchurch2017deep,
  title={{D}eep {F}eature {I}nterpolation for Image Content Changes},
  author={Upchurch, Paul and Gardner, Jacob and Pleiss, Geoff and Pless, Robert and Snavely, Noah and Bala, Kavita and Weinberger, Kilian},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2017}
}

1.1 Requirements

You will need Linux and at least 9 GB of main memory and a recent GPU with at least 3 GB of memory to transform high-resolution images.

The Caffe and Torch deep learning software should be installed so that import caffe and th work.

Python packages:

  pip install numpy scikit-image Pillow opencv-python scipy dlib lutorpy execnet torch torchvision protobuf

2 Demonstrations

2.1 Demo1

demo1

This script produces six kinds of transformations (older, mouth open, eyes open, smiling, moustache, eyeglasses) on LFW faces.

  python demo1.py
  # ~1.3 minutes to reconstruct each image (using 1 Titan-X)
  # Total time: 9.0 minutes

2.2 Demo2

demo2 demo2 demo2

This script ages or adds facial hair to a front-facing portrait at resolutions up to 1000x1000.

Preparing an Images Database

This demo requires a database of high resolution images, which is used to select source and target images for the transformation. Follow the instructions at datasets/facemodel/README.md to collect the database.

Our method requires that your database contains at least 400 source/target images that match the gender and facial expression of the input photo. A warning message will be printed if there are not enough images.

Test images

The source of each test image and our test masks are in datasets/test/. We find that DFI works well on photographs of natural faces which are: un-occluded, front-facing, and lit by natural or office-environment lighting.

python demo2.py <transform> <image> --delta <values>

# e.g. python demo2.py facehair images/facemodel/lfwgoogle/Aaron_Eckhart/00000004.jpg --delta 2.5,3.5,4.5
# possible transforms are 'facehair', 'older', or 'younger'
# 2.1 minutes to reconstruct an 800x1000 image (using 1 Titan-X)
# Total time (800x1000 image): 7.5 minutes

2.3 Demo3

demo3

This script fills in missing portions of shoe images.

To reconstruct one of the shoe images:

  python demo3.py
  # 1.3 minutes to reconstruct each image (using 1 Titan-X)
  # Total time: 1.5 minutes

3 Options

3.1 Reconstruction backend (--backend)

We have two backends. Caffe+SciPy uses Caffe to forward/backward VGG (GPU) then uses SciPy to call the FORTRAN implementation of L-BFGS-B (CPU). Torch uses PyTorch to do the entire reconstruction on the GPU. Torch is faster than Caffe+SciPy but it produces a lower-quality result. We set Caffe+SciPy to be default for the LFW and UT-Zappos50K demonstrations and Torch to be the default for the high-res face demonstration.

Memory

The Torch model needs 6 GB of GPU memory. The Caffe+SciPy backend needs 3 GB of GPU memory to transform high-res images.

3.2 Interpolation "amount" (--delta)

The delta parameter controls how strong a transformation to make. Setting it to zero results in no transformation at all, and larger numbers result in a stronger transformation. You can input multiple values, like --delta 0.1,0.3,0.5 to try multiple transformations (this will be faster than running them individually).

For most transformations, an ideal delta value will be between 0.0 and 1.0 with --scaling beta (between 0.0 and 5.0 with --scaling none).

3.3 Speed (--iter)

The iter parameter controls how many L-BFGS-B optimization steps are used for reconstruction. Less steps means less time and lower quality. This parameter should not be set lower than 150. With --iter 150 the Torch backend takes 20 seconds to reconstruct a 200x200 image and 3 minutes to reconstruct a 725x1000 image.

3.4 Other options

  • --device_id - if you want to specify a GPU to use
  • --K - number of nearest neighbors used to construct source/target sets
  • --postprocess color - matches the color of the final image to match the original image
  • --postprocess mask - apply a mask (for input foo.jpg the mask should be named foo-mask.png)