This repository is the official implementation of the paper:
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang CVPR, 2024. [Project Page] [Paper]
We first provide representative subsets of different sizes of COCO and diffusionDB datasets for users to use directly.
🔥Representative subsets for COCO: subsets for COCO
🔥Representative subsets for diffusionDB: subsets for diffusionDB
🔥The representative subsets are also available on Huggingface🤗: subsets in Huggingface
Besides, if you want to use the FlashEval algorithm to search for subsets by yourself, run the following steps:
To install the required dependencies, use the following commands:
conda create -n flasheval python=3.9
conda activate flasheval
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
cd FlashEval
pip install -e .
We process the COCO and diffusionDB datasets used in the paper as an example here, which you can easily change to your own dataset. Besides, if you use the same model settings as in the paper, we give all the processed data of One-time Generation for your convenience in all_metrics.
We give the prompts of the two datasets processed according to Section 5.1: COCO_40504.json and diffusionDB_5000.json.
For full precision model, running the following command. For quant models, please use q-diffusion to generate images.
python Image_generation/generate.py --model_name <choose model> --scheduler <choose scheduler> --gpu_id <GPU ID to use for CUDA> --seed <seed> --step <step> --save_dir <save_path>
The metrics considered in this paper are:
- CLIP, which measures the text-image alignment
- Aesthetic, which measures how good-looking an image is
- ImageReward, which measures the human preference of an image
- HPS, which measures the human preference of an image
- FID, accelerating FID evaluation with a batch-based approach
For HPS, you need to download the pre-trained model first HPS; For other metrics, running the evaluation comment will automatically download the models.
- For single-image evaluation metrics, running the following command:
python Evaluation/test_benchmark.py --config configs/evaluation.yaml
- For FID, please download the ground truth images from COCO and calculate the score.
- Organize model scores and divide them into training models and testing models. All the processed data of One-time Generation are in all_metrics.
python preprocess/get_model_score.py --config configs/get_score.yaml
python search_algorithm.py --config configs/search.yaml
Note: Since constructing random subsets involves randomness, the results may vary each time the process is run. As iteration increases, the effect gets better.
If you have any questions, please contact lllzz0309zz@gmail.com.
If you find our work useful, please cite:
@article{zhao2024flasheval,
title={FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models},
author={Zhao, Lin and Zhao, Tianchen and Lin, Zinan and Ning, Xuefei and Dai, Guohao and Yang, Huazhong and Wang, Yu},
journal={arXiv preprint arXiv:2403.16379},
year={2024}
}