/TargetCLIP

[ECCV 2022] Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

Primary LanguageJupyter Notebook

[ECCV 2022] TargetCLIP- Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer

This repository finds a global direction in StyleGAN's space to edit images according to a target image. We transfer the essence of a target image to any source image.

Pretrained directions notebooks:

Notebook for celebrity sources/ your own pre-inverted latents:

Open In Colab Open In YouTube

The notebook allows to use the directions on the sources presented in the examples. In addition, there's an option to edit your own inverted images with the pretrained directions, by uploading your latent vector to the dirs folder. We use images inverted by e4e.

Notebook for e4e+TargetCLIP (inversion and manipulation in one notebook):

Open In Colab

Examples:

NOTE: all the examples presented are available in our colab notebook. The recommended coefficient to use is between 0.5-1

Targets that were not inverted- The Joker and Keanu Reeves

The targets are plain images, that were not inverted, the direction optimization is initialized at random.

NOTE: for the joker, we use relatively large coefficients- 0.9-1.3

Out of domain targets- Elsa and Pocahontas

The targets are plain images that are out of the domain StyleGAN was trained on, the direction optimization is initialized at random.

Targets that were inverted- Trump

The targets are inverted images, and the latents are used as initialization for the optimization.

Reproducing results

Downloading pretrained weights

First, please download all the pretrained weights for the experiments to the folder pretrained_models. If you choose to save the pretrained weights in another path, please update the config file accordingly (configs/paths_config.py). Ours tests require downloading the pretrained StyleGAN2 weights, and the pretrained ArcFace weights. For our encoder finetuning and optimizer initialization, please download the e4e pretrained weights.

To enable alignment, run the following:

wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bzip2 -dk shape_predictor_68_face_landmarks.dat.bz2

Training the optimizer and the encoder

Downloading datasets

The targets for our celebrities test can be found here. To train the encoder, please download the CelebA-HQ dataset (both the test set and the train set), and for the FFHQ tests, download the FFHQ set as well, and extract the first 50 images from it.

Training directions with the optimizer

Run the following command:

PYTHONPATH=`pwd` python optimization.py --target_path /path/to/target/image --output_folder path/to/optimizer/output  --lambda_transfer 1 --weight_decay 3e-3 --lambda_consistency 0.5 --step 1000 --lr 0.2 --num_directions 1 --num_images 4 

where num_directions is the number of different directions you wish to train, and num_images is the number of images to use in the consistency tests. Use the random_initiate parameter to initialize the direction randomly instead of the inversion of the target. The result manipulations on the training sources, as well as the produced essence directions will be saved under output_folder.

Training the encoder from scratch

  1. Download ninja=1.10.0, using the following commands:
wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
sudo unzip ninja-linux.zip -d /usr/local/bin/
sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force
  1. Randomly select 200 images from the CelebsHQ train set and place them in: data/celeba_minimized.
  2. Randomly select 50 images from the CelebsHQ test set and place them in: data/data1024x1024/test.
  3. We train our encoder on 5 RTX 2080 Ti GPUs with 11 GB per each GPU. To train the encoder from scratch, run the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4 PYTHONPATH=`pwd` python scripts/train.py --exp_dir name/of/experiment/directory --lambda_consistency 0.5 --batch_size 1 --test_batch_size 1 --lambda_reg 3e-3 --checkpoint_path pretrained_models/e4e_ffhq_encode.pt --image_interval 1 --board_interval 5 --val_interval 31 --dataset_type celeba_encode_minimized --save_interval 200 --max_steps 3000

If you wish to train the encoder with a single GPU, please remove the use of DataParallel in the coach file (training/coach). The best checkpoint will be saved to name/of/experiment/directory/checkpoints.

Important: Please make sure to download the pretrained e4e weights before training in order to enable the finetuning.

Producing quantitative results (id scores, semantic scores)

  1. The latents for our 68 sources are saved under pretrained_weights/celebs.pt.
  2. Use your method to produce a manipulation for each source, target, and save the manipulation results under a folder with the baseline name. The naming convention our tests expect is: {target_name}/{source_idx}.png for example, the manipulation for ariel with source number 1 will be saved as: {baseline_name}/ariel/1.png.
  3. Produce results by running the following command:
PYTHONPATH=`pwd` python ./experiments/calc_metrics.py --style_img_path /path/to/target/images --manipulations_path /output/folder --input_img_path /path/to/source/images

where style_img_path is the path to the target images, manipulations_path is the path to the results of the manipulations, and input_img_path is the path to the 68 source images.

Important: Please note that our optimizer also finds coefficients per source. In our experiments, we found that a 1.2 coefficient is usually the average coefficient for the targets, thus we used it for manipulation with new sources (for both celebrities and FFHQ experiments).

Producing FID

To run the FID test, follow these steps:

  1. Install the FID calculation package.
  2. Extract a random subset of size 7000 from the FFHQ test set.
  3. For each target name, the folder {baseline}/target_name needs to be compared to the subset of FFHQ:
python -m pytorch_fid --device cuda:{gpu_device} /path/to/FFHQ /outdir/target_name
  1. Calculate the average and standard deviation across the FID scores of all targets.

Citing our paper

If you make use of our work, please cite our paper:

@article{chefer2021targetclip,
  title={Image-Based CLIP-Guided Essence Transfer},
  author={Chefer, Hila and Benaim, Sagie and Paiss, Roni and Wolf, Lior},
  journal={arXiv preprint arXiv: 2110.12427},
  year={2021}
}

Credits

The code in this repo draws from the StyleCLIP, e4e code bases.