/HIDA

[ICCV'23] Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation

Primary LanguagePythonMIT LicenseMIT

HIDA: Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation

Abstract

Facial sketch synthesis (FSS) aims to generate a vivid sketch portrait from a given facial photo. Existing FSS methods merely rely on 2D representations of facial semantic or appearance. However, professional human artists usually use outlines or shadings to covey 3D geometry. Thus facial 3D geometry (e.g. depth map) is extremely important for FSS. Besides, different artists may use diverse drawing techniques and create multiple styles of sketches; but the style is globally consistent in a sketch. Inspired by such observations, in this paper, we propose a novel \textit{Human-Inspired Dynamic Adaptation} (HIDA) method. Specially, we propose to dynamically modulate neuron activations based on a joint consideration of both facial 3D geometry and 2D appearance, as well as globally consistent style control. Besides, we use deformable convolutions at coarse-scales to align deep features, for generating abstract and distinct outlines. Experiments show that HIDA can generate high-quality sketches in multiple styles, and significantly outperforms previous methods, over a large range of challenging faces. Besides, HIDA allows precise style control of the synthesized sketch, and generalizes well to natural scenes. Our code will be released after peer review.

Paper Information

Fei Gao, Yifan Zhu, Chang Jiang, Nannan Wang, Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation, Proceedings of the International Conference on Computer Vision (ICCV), accepted, 2023.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{jiang2023masked,
  title={Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation},
  author={Gao, Fei and Zhu, Yifan and Jiang, Chang and Wang, Nannan},
  booktitle={Proceedings of the International Conference on Computer Vision (ICCV)},
  pages={},
  year={2023}
}

Pipeline

localFace1

localFace1

Sample Results

  • Comparison with SOTAs on the FS2K dataset:

localFace1

localFace2

localFace3

(a)Photo (b)Depth (c)Ours (d)Pix2PixHD (e)FSGAN (f)SCA-GAN (g)GT (h)Pix2Pix (i)MDAL (j)CycleGAN (k)GENRE

  • Performance on faces in-the-wild:

wildFace1

wildFace2

wildFace3

(a)Photo (b)Ours(Style1) (c)Ours(Style2) (d)Ours(Style3) (e)GENRE (f)Pix2Pix (g)CycleGAN (h)SCA-GAN

  • Performance of our DISC model on natural images:

cat

building1

building2

(a)Photo (b)Depth (c)Ours(Style1) (d)Ours(Style2) (e)Ours(Style3)

  • Extension to Pen-drwings and Oilpaintings localFace1

  • More Results:

We offer more results here: https://drive.google.com/file/d/1vT0nqEVVByBW1QltYVX_mIYCcZ4wXsQD/view?usp=sharing

Prerequisites

  • Linux or macOS
  • Python 3.8.12
  • Pytorch-lightning 0.7.5
  • CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

  • Clone this repo:

    git clone https://github.com/AiArt-HDU/DISC
    cd DISC
    
  • Install PyTorch 1.7.1 and torchvision from http://pytorch.org and other dependencies (e.g., visdom and dominate). You can install all the dependencies by

    pip install -r requirements.txt
    
  • The installation environment of DCN-V2 depandency is more complicated,you can refer to the

Apply a pre-trained model

  • A face photo↦sketch model pre-trained on dataset FS2K
  • The pre-trained model need to be save at ./checkpoint
  • Then you can test the model

Train/Test

  • Download the dataset FS2K here

  • Train a model

    python train.py --root your_root_path_train
    
  • Test the model: please prepare your test data's depth maps using 3DDFA methods

    python test.py --data_dir your_data_path_test --depth_dir your_depth_path_test 
    
  • If you want to train on your own data, please first align your pictures and prepare your data's depth maps according to tutorial in preprocessing steps.

Preprocessing steps

Face photos (and paired drawings) need to be aligned and have depth maps. And depth maps after alignment are needed in our code in training.

In our work, depth map is generated by method in [1]

  • First, we need to align, resize and crop face photos (and corresponding drawings) to 250x250
  • Then,we use code in 3DDFA

to generate depth maps for face photos and drawings.

[1] J. Guo, X. Zhu, Y. Yang, F. Yang, Z. Lei, and S. Z. Li, “Towards fast, accurate and stable 3d dense face alignment,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020.

Citation

If you use this code for your research, please cite our paper.

Acknowledgments

Our code is inspired by pytorch-CycleGAN-and-pix2pix, GENRE, and CocosNet.