/CelebV-HQ

[ECCV 2022] CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

Primary LanguagePython

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset (ECCV 2022)

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
Hao Zhu*, Wayne Wu*, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, and Chen Change Loy
In ECCV 2022. (*Equal contribution)
Demo Video | Project Page | Paper (Coming soon)

Abstract: Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields. However, the academic community still lacks a video dataset with diverse facial attribute annotations, which is crucial for the research on face-related videos. In this work, we propose a large-scale, high-quality, and diverse video dataset with rich facial attribute annotations, named the High-Quality Celebrity Video Dataset (CelebV-HQ). CelebV-HQ contains 35,666 video clips with the resolution of 512x512 at least, involving 15,653 identities. All clips are labeled manually with 83 facial attributes, covering appearance, action, and emotion. We conduct a comprehensive analysis in terms of age, ethnicity, brightness stability, motion smoothness, head pose diversity, and data quality to demonstrate the diversity and temporal coherence of CelebV-HQ. Besides, its versatility and potential are validated on two representative tasks, i.e., unconditional video generation and video facial attribute editing. Furthermore, we envision the future potential of CelebV-HQ, as well as the new opportunities and challenges it would bring to related research directions.

Updates

  • [21/6/2022] The codebase and project page are created.

TODO

  • Data download scripts
  • Inference code
  • Pretrained models of unconditional video generation

Statistics

demo.mp4

The distributions of each attribute. CelebV-HQ has a diverse distribution on each attribute category. Overall, CelebV-HQ contains diverse facial attributes and natural distributions, bringing new opportunities and challenges to the community.

Agreement

  • The CelebV-HQ dataset is available for non-commercial research purposes only.
  • All videos of the CelebV-HQ dataset are obtained from the Internet which are not property of SenseTime Research. The SenseTime Research is not responsible for the content nor the meaning of these videos.
  • You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the videos and any portion of derived data.
  • You agree not to further copy, publish or distribute any portion of the CelebV-HQ dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

Download

Usage:

Prepare the environment:

pip install yt-dlp
pip install opencv-python

Run script:

# you can change the download folder in the code 
python download_tools.py

JSON File Structure:

{
"meta_info": 
    {
        "appearance_mapping": ["Blurry", "Male", "Young", ...],  // appearance attributes
        "action_mapping": ["blow", "chew", "close_eyes", ...]    // action attributes
    },  

"clips": 
{
    "M2Ohb0FAaJU_1":  // clip 1 
    {
        "ytb_id": "M2Ohb0FAaJU",                                   // youtube id
        "duration": {"start_sec": 81.62, "end_sec": 86.17},        // start and end times in the original video
        "bbox": {"top": 0.0, "bottom": 0.8815, "left": 0.1964, "right": 0.6922},  // bounding box
        "attributes":                                              // attributes information
        {
            "appearance": [0, 0, 1, ...],                          // same order as the "appearance_mapping"
            "action": [0, 0, 0, ...],                              // same order as the "action_mapping"
            "emotion": {"sep_flag": false, "labels": "neutral"}    // only one emotion in the clip 
         }, 
         "version": "v0.1"
           
    },
    "_0tf2n3rlJU_0":  // clip 2 
    {
        "ytb_id": "_0tf2n3rlJU", 
        "duration": {"start_sec": 52.72, "end_sec": 56.1}, 
        "bbox": {"top": 0.0, "bottom": 0.8407, "left": 0.5271, "right": 1.0}, 
        "attributes": 
        {
            "appearance": [0, 0, 1, ...], 
            "action": [0, 0, 0, ...], 
            "emotion": 
            {
                "sep_flag": true, "labels": [                      // multi-emotion in the clip
                    {"emotion": "neutral", "start_sec": 0, "end_sec": 0.28}, 
                    {"emotion": "happy", "start_sec": 1.28, "end_sec": 3.28}]
            }
        }, 
        "version": "v0.1" 
    }
    "..."
    "..."

}

Baselines

Unconditional Video Generation

To train other baselines, we used their original implementations in our paper:

Facial Attribute Editing

Related Works

  • (ECCV 2022) StyleGAN-Human: A Data-Centric Odyssey of Human Generation, Jianglin Fu et al. [Paper], [Project Page], [Dataset]

Citation

If you find this work useful for your research, please consider citing our paper:

@inproceedings{zhu2022celebvhq,
  title={{CelebV-HQ}: A Large-Scale Video Facial Attributes Dataset},
  author={Zhu, Hao and Wu, Wayne and Zhu, Wentao and Jiang, Liming and Tang, Siwei and Zhang, Li and Liu, Ziwei and Loy, Chen Change},
  booktitle={ECCV},
  year={2022}
}

Acknowledgement

We sincerely thank Zongcai Sun for his help with source data preparation and the download tool development.