/SFHQ-dataset

Synthetic Faces High Quality (SFHQ) Dataset

Primary LanguagePythonMIT LicenseMIT

Synthetic Faces High Quality (SFHQ) dataset

SFHQ dataset sample images

This dataset consists of 3 parts, each part containing ~90,000 curated high quality 1024x1024 synthetic face images. it was created by "bringing to life" various art works (paintings, drawings, 3D models) using a process similar to what is described in this short twitter thread which involve encoding the images into StyleGAN2 latent space and performing a small manipulation that turns each image into a photo-realistic image. These candidate images were then further curated using a semi-manual semi-automatic process with the help of the lightweight visual taste aprroximator tool

The dataset also contains facial landmarks (an extended set) and face parsing semantic segmentation maps. An example script (explore_dataset.py) is provided (live kaggle notebook here) and demonstrates how to access landmarks, segmentation maps, and textually search withing the dataset (with CLIP image/text feature vectors), and also performs some exploratory analysis of the dataset.

Example illustation of landmarks and segmentation maps below:
SFHQ dataset landmarks and segmentation

Download

The dataset can be downloaded via kaggle:

  • Part 1
  • Part 2
  • Part 3 will be released sometime during October 2022

More Details about dataset generation and collection

  1. The original inspiration images are taken from:
  2. Each inspiration image was encoded by encoder4editing (e4e) into StyleGAN2 latent space (StyleGAN2 is a generative face model tained on FFHQ dataset) and multiple candidate images were generated from each inspiration image
  3. These candidate images were then further curated and verified as being photo-realistic and high quality by a single human (me) and a machine learning assistant model that was trained to approximate my own human judgments and helped me scale myself to asses the quality of all images in the dataset. The code for the tool used for this purpuse can be found here
  4. Near duplicates and images that were too similar were removed using CLIP features (no two images in the dataset have CLIP similarity score of greater than ~0.92)
  5. From each image various pre-trained features were extracted and provided here for convenience, in particular CLIP features for fast textual query of the dataset, the feaures are under pretrained_features/ folder
  6. From each image, semantic segmentation maps were extracted using Face Parsing BiSeNet and are provided in the dataset under under segmentations/ folder
  7. From each image, an extended landmark set was extracted that also contain inner and outer hairlines (these are unique landmarks that are usually not extracted by other algorithms). These landmarks were extracted using Dlib, Face Alignment and some post processing of Face Parsing BiSeNet and are provided in the dataset under landmarks/ folder
  8. NOTE: semantic segmentation and landmarks were first calculated on scaled down version of 256x256 images, and then upscaled to 1024x1024

Example of dataset generation process on artistic illustations and paintings taken from AAHQ:
SFHQ dataset paintings

Example of dataset generation process on 3D models taken from Face Synthetics, Close-Up Humans, and UIBVFED:
SFHQ dataset 3D models

Example of dataset generation process of correcting faults in face images generated by Stable Diffusion:
SFHQ dataset stable diffusion

Part 3

  • Part 3 of the dataset will be released sometime in October 2022 and will be similar in size and quality to parts 1 & 2 but be based on home-brewed protocol of StyleGAN2 image generation that involves using its mapping network and e4e encoder prior enforcing for increased fidelity and diversity of generated images. Curated as well, of course.

Demonstation of variability in the dataset

we deomonstate the variability of the images in the dataset by textual query of the dataset with CLIP ViT-L/14@336 model embeddings:

  • Hair color:
    SFHQ dataset variability haircolor

  • Age:
    SFHQ dataset variability age

  • Ethnicity:
    SFHQ dataset variability ethnicity

  • Facial expression:
    SFHQ dataset variability expression

Additional variability demonstations can be found under images/

Privacy

Since all images in this dataset are synthetically generated there are no privacy issues or license issues surrounding these images.

Summary

Overall the 3 parts of this dataset contain ~270,000 high quality and curated synthetic face images that have no privacy issues or license issues surrounding them.

This dataset contains a high degree of variability on the axes of identity, ethnicity, age, pose, expression, lighting conditions, hair-style, hair-color, facial hair. It lacks variability in accessories axes such as hats or earphones as well as various jewelry. It also doesn't contain any occlusions except the self-occlusion of hair occluding the forehead, the ears and rarely the eyes. This dataset naturally inherits all the biases of it's original datasets (FFHQ, AAHQ, Close-Up Humans, Face Synthetics, LAION-5B) and the StyleGAN2 and Stable Diffusion models.

The purpose of this dataset is to be of sufficiently high quality that new machine learning models can be trained using this data, including even generative face models such as StyleGAN. The dataset may be extended from time to time with additional supervision labels (e.g. text descriptions), but no promises.

Hope this is helpful to some of you, feel free to use as you see fit...