pinglmlcv/Multi-Modal-CelebA-HQ-Dataset

[CVPR 2021] A large-scale face image dataset that allows text-to-image-generation, text-guided image manipulation, sketch-to-image generation, GANs for face generation and editing, image caption, and VQA.

Multi-Modal-CelebA-HQ

Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.

Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms of text-to-image generation, text-guided image manipulation, sketch-to-image generation, image caption, and VQA. This dataset is proposed and used in TediGAN.

Data Generation

The textual descriptions are generated using probabilistic context-free grammar (PCFG) based on the given attributes. We create ten unique single sentence descriptions per image to obtain more training data following the format of the popular CUB dataset and COCO dataset. The previous study proposed CelebTD-HQ, but it is not publicly available.
For label, we use CelebAMask-HQ dataset, which contains manually-annotated semantic mask of facial attributes corresponding to CelebA-HQ.
For sketches, we follow the same data generation pipeline as in DeepFaceDrawing. We first apply Photocopy filter in Photoshop to extract edges, which preserves facial details and introduces excessive noise, then apply the sketch-simplification to get edge maps resembling hand-drawn sketches.
For background removing, we use an open-source tool Rembg and a commercial software removebg. Different backgrounds can be further added using image composition or harmonization methods like DoveNet.

Overview

All data is hosted on Google Drive:

Path	Size	Files	Format	Description
multi-modal-celeba	~200 GB	420,002		Main folder
├ image	~2 GB	30,000	JPG	images from celeba-hq of size 512×512
├ label	~1 GB	30,000	PNG	masks from celeba-mask-hq of size 512×512
├ sketch	398 MB	30,000	PNG	sketches (10 samples and sketch.zip)
├ text	11 MB	30,0000	TXT	10 descriptions of each image in celeba-mask-hq
├ train	347 KB	1	PKL	filenames of training images
├ test	81 KB	1	PKL	filenames of test images
└ rmebg	~20 GB	30,000	PNG	image with transparent background (password: 3amt)

Multi-Modal-CelebA-HQ Dataset Downloads

Google Drive: downloading link
Baidu Drive: downloading link (password: y5w4)

Pretrained Models

We provide the pretrained models of AttnGAN, ControlGAN, DMGAN, DFGAN, and ManiGAN. Feel free to pull requests if you have any updates.

Method	FID	LPIPIS	Download
AttnGAN	125.98	0.512	Pretrained
ControlGAN	116.32	0.522	Pretrained
DFGAN	137.60	0.581	Pretrained
DM-GAN	131.05	0.544	Pretrained
TediGAN	106.37	0.456	Pretrained

Related Works

CelebA dataset:
Ziwei Liu, Ping Luo, Xiaogang Wang and Xiaoou Tang, "Deep Learning Face Attributes in the Wild", in IEEE International Conference on Computer Vision (ICCV), 2015
CelebA-HQ was collected from CelebA and further post-processed by the following paper :
Karras et. al., "Progressive Growing of GANs for Improved Quality, Stability, and Variation", in Internation Conference on Reoresentation Learning (ICLR), 2018
CelebAMask-HQ manually-annotated masks with the size of 512 x 512 and 19 classes including all facial components and accessories such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. It was collected by the following paper :
Lee et. al., "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", in Computer Vision and Pattern Recognition (CVPR), 2020

To Do Lists

upload image with transparent background
remove the background of each image (release the first version at Nov.14, 2020)
create the 3D model for each image
upload the inverted codes

Dataset Agreement

The Multi-Modal-CelebA-HQ dataset is available for non-commercial research purposes only.
You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
You agree not to further copy, publish or distribute any portion of the CelebAMask-HQ dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

License and Citation

The use of this software is RESTRICTED to non-commercial research and educational purposes.

If you find this dataset helpful for your research, please consider to cite:

@inproceedings{xia2021tedigan,
  title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

@article{xia2021open,
  title={Towards Open-World Text-Guided Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  journal={arxiv preprint arxiv: 2104.08910},
  year={2021}
}

@inproceedings{karras2017progressive,
  title={Progressive growing of gans for improved quality, stability, and variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

@inproceedings{liu2015faceattributes,
 title = {Deep Learning Face Attributes in the Wild},
 author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
 booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
 year = {2015} 
}

If you use the labels, please cite:

@inproceedings{CelebAMask-HQ,
  title={MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
  author={Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}