/Multi-Modal-CelebA-HQ-Dataset

[CVPR 2021] A large-scale face image dataset that allows text-to-image-generation, text-guided image manipulation, sketch-to-image generation, GANs for face generation and editing, image caption, and VQA.

Multi-Modal-CelebA-HQ

Paper License: MIT Maintenance PR's Welcome Images 30000

Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.

Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms of text-to-image generation, text-guided image manipulation, sketch-to-image generation, image caption, and VQA. This dataset is proposed and used in TediGAN.

Data Generation

  • The textual descriptions are generated using probabilistic context-free grammar (PCFG) based on the given attributes. We create ten unique single sentence descriptions per image to obtain more training data following the format of the popular CUB dataset and COCO dataset. The previous study proposed CelebTD-HQ, but it is not publicly available.
  • For label, we use CelebAMask-HQ dataset, which contains manually-annotated semantic mask of facial attributes corresponding to CelebA-HQ.
  • For sketches, we follow the same data generation pipeline as in DeepFaceDrawing. We first apply Photocopy filter in Photoshop to extract edges, which preserves facial details and introduces excessive noise, then apply the sketch-simplification to get edge maps resembling hand-drawn sketches.
  • For background removing, we use an open-source tool Rembg and a commercial software removebg. Different backgrounds can be further added using image composition or harmonization methods like DoveNet.

Overview

image

All data is hosted on Google Drive:

Path Size Files Format Description
multi-modal-celeba ~200 GB 420,002 Main folder
├  image ~2 GB 30,000 JPG images from celeba-hq of size 512×512
├  label ~1 GB 30,000 PNG masks from celeba-mask-hq of size 512×512
├  sketch 398 MB 30,000 PNG sketches (10 samples and sketch.zip)
├  text 11 MB 30,0000 TXT 10 descriptions of each image in celeba-mask-hq
├  train 347 KB 1 PKL filenames of training images
├  test 81 KB 1 PKL filenames of test images
└  rmebg ~20 GB 30,000 PNG image with transparent background (password: 3amt)

Multi-Modal-CelebA-HQ Dataset Downloads

Pretrained Models

We provide the pretrained models of AttnGAN, ControlGAN, DMGAN, DFGAN, and ManiGAN. Feel free to pull requests if you have any updates.

Method FID LPIPIS Download
AttnGAN 125.98 0.512 Pretrained
ControlGAN 116.32 0.522 Pretrained
DFGAN 137.60 0.581 Pretrained
DM-GAN 131.05 0.544 Pretrained
TediGAN 106.37 0.456 Pretrained

Related Works

  • CelebA dataset:
    Ziwei Liu, Ping Luo, Xiaogang Wang and Xiaoou Tang, "Deep Learning Face Attributes in the Wild", in IEEE International Conference on Computer Vision (ICCV), 2015
  • CelebA-HQ was collected from CelebA and further post-processed by the following paper :
    Karras et. al., "Progressive Growing of GANs for Improved Quality, Stability, and Variation", in Internation Conference on Reoresentation Learning (ICLR), 2018
  • CelebAMask-HQ manually-annotated masks with the size of 512 x 512 and 19 classes including all facial components and accessories such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. It was collected by the following paper :
    Lee et. al., "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", in Computer Vision and Pattern Recognition (CVPR), 2020

To Do Lists

  • upload image with transparent background
  • remove the background of each image (release the first version at Nov.14, 2020)
  • create the 3D model for each image
  • upload the inverted codes

Dataset Agreement

  • The Multi-Modal-CelebA-HQ dataset is available for non-commercial research purposes only.
  • You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
  • You agree not to further copy, publish or distribute any portion of the CelebAMask-HQ dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

License and Citation

The use of this software is RESTRICTED to non-commercial research and educational purposes.

If you find this dataset helpful for your research, please consider to cite:

@inproceedings{xia2021tedigan,
  title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

@article{xia2021open,
  title={Towards Open-World Text-Guided Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  journal={arxiv preprint arxiv: 2104.08910},
  year={2021}
}

@inproceedings{karras2017progressive,
  title={Progressive growing of gans for improved quality, stability, and variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

@inproceedings{liu2015faceattributes,
 title = {Deep Learning Face Attributes in the Wild},
 author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
 booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
 year = {2015} 
}

If you use the labels, please cite:

@inproceedings{CelebAMask-HQ,
  title={MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
  author={Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}