/pretrained-GANs

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

MIT LicenseMIT

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

arXiv

Abstract

Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality. With the ability to generate photo-realistic high-resolution (e.g., $1024\times1024$) images, recent GAN models have greatly narrowed the gaps between the generated images and the real ones. Therefore, many recent works show emerging interest to take advantage of pre-trained GAN models by exploiting the well-disentangled latent space and the learned GAN priors. We briefly review recent progress on leveraging pre-trained large-scale GAN models from three aspects, i.e., 1) the training of large-scale generative adversarial networks, 2) exploring and understanding the pre-trained GAN models, and 3) leveraging these models for subsequent tasks like image restoration and editing.

Contents

Figure 1. Illustration of GAN inversion methods.
Illustration

In this figure, $\mathbf{x}$ and $\mathbf{\hat{x}}$ are given real image and generated image, respectively. The red dotted line means supervision. It can be seen that the in-domain constraint requires the generated image $\mathbf{\hat{x}}$ can be inverted back into the latent space. Here, $\mathbf{z}$ is not restricted in $\mathcal{Z}$ space, and may refer to more generic latent code (e.g., $\mathbf{w}$, $\mathbf{f}$, etc).

Figure Content (PDF file here)

Figure 2. A Summary of Relevant Papers If you want to get the raw file, please refer to ProcessOn.com (passcode: 1qaz)

Figure 3. Illustration of recent GAN models (see (a)$\sim$(d)) and the latent spaces of StyleGAN series (see (e)).
Illustration (a) For PGGAN, the blue part denotes the progressive growing procedure from $4\times4$ to $8\times8$. The components with dash lines are employed for the fade-in strategy, where $\alpha$ is gradually growing to 1. They are discarded when the model grows to a higher-resolution. (b) For BigGAN, a specific noise is delivered to each layer together with the class embedding, and the model is end-to-end trained without the progressive growing procedure. (c) For StyleGAN, a series of FC layers are deployed to map $\mathbf{z}$ into $\mathbf{w}$. The green only belongs to StyleGAN2. (d) For StyleGAN3, the generator is largely modulated to improve the translational and rotation equivariance. The discriminator is omitted since it is identical with that used in StyleGAN2. (e) For simplicity, here we take the StyleGAN series as an example to show the latent spaces based on GAN inversion task.
Figure Content (PDF file here)

Table 1. A summary of GAN inversion and methods leveraging pre-trained GANs for image editing and restoration.
Illustration

For the inversion method, "O", "L", "T" represent optimization-based, learning-based, and training-based (or fine-tuning) methods, while "/" means no inversion is performed in this method, and the numbers (without square brackets) are the indices of methods used for inversion in this table. Note that the methods are ordered (roughly) according to publicly accessible time (e.g., the appear time on ArXiv, openreview.net, CVF Open Access, etc.).

Abbreviations

$^\ast$ Abbreviations: AD (ADE20K), AF (AFHQ), CA (CelebA), CD (CACD), CF (CIFAR), CH (CelebA-HQ), CM (CelebAMask-HQ), CO (MS COCO), CS (CityScapes), CU (Caltech-UCSD Birds), DA (Danbooru, aka Anime Faces), DF (DeepFashion), FF (FFHQ), FL (Flowers), IN (ImageNet), LF (LFW), LS (LSUN), MF (MetFaces), MN (MNIST), OM (Omniglot), P3 (Places365), PL (Places), PT (Oxford-IIIT Pet, aka Cats and Dogs), RA (RAVDESS), SC (Stanford Cars), SS (Streetscape), SV (SVHN), TR (Transient), UT (UT Zappos50K)

$^\dagger$ Abbreviations: AD (Adversarial Defense), AE (Attribute Editing, i.e., w/o reference), AN (Anomaly Detection), AR (Artifacts Removal), AT (Attribute Transfer, i.e., w/ reference), CO (Image Crossover), [U]DA ([Unsupervised] Domain Adaptation), DN (Image Denoising), FF (Face Frontalization), IC (Image Colorization), IG (Image Generation), IH (Information Hiding), Int (Interpolation), Inv (Inversion), IP (Inpainting), PI (Parsing or Segmentation to Image), SI (Sketch to Image), SR (Image Super-resolution), ST (Style Transfer), TR (Transform and Random Jittering).

$^\ddagger$ Some custom datasets collected or regenerated by the authors are omitted since they are not publicly available or can be generated automatically based on current public datasets.

Table Content
No. Method Publication Backbone Latent Space Inversion Method Dataset$^\ast$ Application$^\dagger$
1 BiGAN (Link) (Code) ICLR 2017 / $\mathcal{Z}$ T MN, IN Inv
2 ALI (Link) (Code) ICLR 2017 / $\mathcal{Z}$ T CF, SV, CA, IN Inv, Int
3 Zhu et al. (Link) (Code) ECCV 2016 DCGAN $\mathcal{Z}$ L, O SH, LS, PL$^\ddagger$ Inv, Int, AE
4 IcGAN (Link) (Code) NeurIPSw 2016 cGAN $\mathcal{Z}$, $\mathcal{C}$ L MN, CA Inv, AT, AE
5 Creswell et al. (Link) (Code) T-NNLS 2018 DCGAN, WGAN-GP $\mathcal{Z}$ O OM, UT, CA Inv
6 Lipton et al. (Link) (Code) ICLRw 2017 DCGAN $\mathcal{Z}$ O CA Inv
7 PGD-GAN (Link) (Code) ICASSP 2018 DCGAN $\mathcal{Z}$ O MN, CA Inv
8 Ma et al. (Link) (Code) NeurIPS 2018 DCGAN $\mathcal{Z}$ O MN, CA Inv, IP
9 Suzuki et al. (Link) (Code) ArXiv 2018 SNGAN, BigGAN, StyleGAN $\mathcal{F}$ 3 IN, FL, FF, DA CO
10 GANDissection (Link) (Code) ICLR 2019 PGGAN $\mathcal{F}$ / LS, AD AE, AR
11 NPGD (Link) (Code) ICCV 2019 DCGAN, SAGAN $\mathcal{Z}$ L, O MN, CA, LS Inv, SR, IP
12 Image2StyleGAN (Link) (Code) ICCV 2019 StyleGAN $\mathcal{W}+$ O FF$^\ddagger$ Inv, Int, AE, ST
13 Bau et al. (Link) (Code) ICLRw 2019 PGGAN, WGAN-GP, StyleGAN $\mathcal{Z}$, $\mathcal{W}$ L, O LS Inv
14 GANPaint (Link) (Demo) ToG 2019 PGGAN $\mathcal{Z}$, $\Theta$ L, O, T LS Inv, AE
15 InterFaceGAN(Link) (Code) CVPR 2020 PGGAN, StyleGAN $\mathcal{Z}$, $\mathcal{W}$ 3, 8 CH AE, AR
16 GANSeeing(Link) (Code) ICCV 2019 PGGAN, WGAN-GP, StyleGAN $\mathcal{Z}$, $\mathcal{W}$ 13 LS Inv
17 YLG(Link) (Code) CVPR 2020 SAGAN $\mathcal{Z}$ O IN Inv
18 Image2StyleGAN++(Link) (Video) CVPR 2020 StyleGAN $\mathcal{W}+$, $\mathcal{N}$ O LS, FF Inv, CO, IP, AE, ST
19 mGANPrior(Link) (Code) CVPR 2020 PGGAN, StyleGAN $\mathcal{Z}$ O FF, CH, LS Inv, IC, SR, IP, DN, AE
20 MimicGAN(Link) IJCV 2020 DCGAN $\mathcal{Z}$ O CA, FF, LF Inv, UDA, AD, AN
21 PULSE(Link) (Code) CVPR 2020 StyleGAN $\mathcal{Z}$ O FF, CH Inv, SR
22 DGP(Link) (Code) ECCV 2020 BigGAN $\mathcal{Z}$ O, T IN, P3 Inv, Int, IC, IP, SR, AD, TR, AE
23 StyleGAN2Distillation(Link) (Code) ECCV 2020 StyleGAN2, pix2pixHD $\mathcal{W}+$ / FF AT, AE
24 EditingInStyle(Link) (Code) CVPR 2020 PGGAN, StyleGAN, StyleGAN2 $\mathcal{F}$ / FF, LS AT
25 StyleRig(Link) (Video) CVPR 2020 StyleGAN $\mathcal{W}+$ / FF AT
26 ALAE(Link) (Code) CVPR 2020 StyleGAN $\mathcal{W}$ T MN, FF, LS, CH Inv, AT
27 IDInvert(Link) (Code) ECCV 2020 StyleGAN $\mathcal{W}+$ L, O FF, LS Inv, Int, AE, CO
28 pix2latent(Link) (Code) ECCV 2020 BigGAN, StyleGAN2 $\mathcal{Z}$ O IN, CO, CF, LS Inv, TR, AE
29 IDDistanglement(Link) (Code) ToG 2020 StyleGAN $\mathcal{W}$ L FF Inv, AT
30 WhenAndHow(Link) ArXiv 2020 MLP $\mathcal{Z}$ O MN Inv, IP
31 Guan et al.(Link) ArXiv 2020 StyleGAN $\mathcal{W}+$ L, O CH, CD Inv, Int, AT, IC
32 SeFa(Link) (Code) CVPR 2021 PGGAN, BigGAN, StyleGAN $\mathcal{Z}$ 19, 27 FF, CH, LS, IN, SS, DA AE
33 GH-Feat(Link) (Code) CVPR 2021 StyleGAN $\mathcal{S}$ L MN, FF, LS, IN Inv, AT, AE
34 pSp(Link) (Code) CVPR 2021 StyleGAN2 $\mathcal{W}+$ L FF, AF, CH, CM Inv, FF, SI, SR
35 StyleFlow(Link) (Code) ToG 2021 StyleGAN, StyleGAN2 $\mathcal{W}+$ 12 FF, LS AT, AE
36 PIE(Link) (Code) ToG 2020 StyleGAN $\mathcal{W}+$ O FF AT, AE
37 Bartz et al.(Link) (Code) BMVC 2020 StyleGAN, StyleGAN2 $\mathcal{Z}$, $\mathcal{W}+$ L FF, LS Inv, DN
38 StyleIntervention(Link) ArXiv 2020 StyleGAN2 $\mathcal{S}$ O FF Inv, AE
39 StyleSpace(Link) (Code) CVPR 2021 StyleGAN2 $\mathcal{S}$ O FF, LS Inv, AE
40 Hijack-GAN(Link) (Code) CVPR 2021 PGGAN, StyleGAN $\mathcal{Z}$ / CH AE
41 NaviGAN(Link) (Code) CVPR 2021 pix2pixHD, BigGAN, StyleGAN2 $\Theta$ StyleGAN2 FF, LS, CS, IN AE
42 GLEAN(Link) (Code) CVPR 2021 StyleGAN $\mathcal{W}+$ L FF, LS Inv, SR
43 ImprovedGANEmbedding(Link) (Code) ArXiv 2020 StyleGAN, StyleGAN2 $\mathcal{P}$ O FF, MF$^\ddagger$ Inv, IC, IP, SR
44 GFPGAN(Link) (Code) CVPR 2021 StyleGAN2 $\mathcal{W}$ L FF Inv, SR
45 EnjoyEditing(Link) (Code) ICLR 2021 PGGAN, StyleGAN2 $\mathcal{Z}$ 12 FF, CA, CH, P3, TR Inv, AE
46 SAM(Link) (Code) ToG 2021 StyleGAN $\mathcal{W}+$ L CA, CH AE
47 e4e(Link) (Code) ToG 2021 StyleGAN2 $\mathcal{W}+$ L FF, CH, LS, SC Inv, AE
48 StyleCLIP(Link) (Code) ICCV 2021 StyleGAN2 $\mathcal{W}+$, $\mathcal{S}$ 47, O FF, CH, LS, AF AE
49 LatentComposition(Link) (Code) ICLR 2021 PGGAN, StyleGAN2 $\mathcal{Z}$ L FF, CH, LS Inv, IP, AT
50 GANEnsembling(Link) (Code) CVPR 2021 StyleGAN2 $\mathcal{W}+$ L, O CH, SC, PT Inv, AT
51 ReStyle(Link) (Code) ICCV 2021 StyleGAN2 $\mathcal{W}+$ L FF, CH, SC, LS, AF Inv, AE
52 E2Style(Link) (Code) T-IP 2022 StyleGAN2 $\mathcal{W}+$ L FF, CH Inv, SI, PI, AT, IP, SR, AE, IH
53 GPEN(Link) (Code) CVPR 2021 StyleGAN2 $\mathcal{W}+$, $\mathcal{N}$ L FF, CH Inv, SR
54 Consecutive(Link) (Code) ICCV 2021 StyleGAN $\mathcal{W}+$ O FF, RA Inv, Int, AE
55 BDInvert(Link) (Code) ICCV 2021 StyleGAN, StyleGAN2 $\mathcal{F}$/$\mathcal{W}+$ O FF, CH, LS Inv, AE
56 HFGI(Link) (Code) CVPR 2022 StyleGAN2 $\mathcal{W}+$, $\mathcal{F}$ L FF, CH, SC Inv, AE
57 VisualVocab(Link) (Code) ICCV 2021 BigGAN $\mathcal{Z}$ / P3, IN AE
58 HyperStyle(Link) (Code) CVPR 2022 StyleGAN2 $\mathcal{W}+$ L FF, CH, AF Inv, AE, ST
59 GANGealing(Link) (Code) CVPR 2022 StyleGAN2 $\mathcal{W}$ / LS, FF, AF, CH, CU TR
60 HyperInverter(Link) (Code) CVPR 2022 StyleGAN2 $\mathcal{W}$, $\Theta$ L FF, CH, LS Inv, Int, AE
61 InsetGAN(Link) (Code) CVPR 2022 StyleGAN2 $\mathcal{W}+$ O FF, DF$^\ddagger$ CO, IG
62 HairMapper(Link) (Code) CVPR 2022 StyleGAN2 $\mathcal{W}+$ 47 FF, CM$^\ddagger$ AE
63 SAMInv(Link) (Code) CVPR 2022 BigGAN-deep, StyleGAN2 $\mathcal{W}+$, $\mathcal{F}$ L FF, LS, IN Inv, AE

Contributions

Pull requests are welcome for error correction and content expansion!

Tips:

  1. The tables in latex and markdown can be generated by tablesgenerator.com
  2. You can download our table content from here, and load it (or your own CSV files) in tablesgenerator.com

Citation

Please find more details in our paper. If you find it useful, please consider citing

@article{liu2022pretrainedGANs,
  title={A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration},
  author={Liu, Ming and Wei, Yuxiang and Wu, Xiaohe and Zuo, Wangmeng and Zhang, Lei},
  journal={arXiv preprint arXiv:2207.10309},
  year={2022}
}