FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

This repository is the official implementation of the paper:

FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang CVPR, 2024. [Project Page] [Paper]

Representative subsets

We first provide representative subsets of different sizes of COCO and diffusionDB datasets for users to use directly.

🔥Representative subsets for COCO: subsets for COCO

🔥Representative subsets for diffusionDB: subsets for diffusionDB

🔥The representative subsets are also available on Huggingface🤗: subsets in Huggingface

Besides, if you want to use the FlashEval algorithm to search for subsets by yourself, run the following steps:

Setup

To install the required dependencies, use the following commands:

conda create -n flasheval python=3.9
conda activate flasheval
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
cd FlashEval
pip install -e .

One-time generation

We process the COCO and diffusionDB datasets used in the paper as an example here, which you can easily change to your own dataset. Besides, if you use the same model settings as in the paper, we give all the processed data of One-time Generation for your convenience in all_metrics.

Textual dataset

We give the prompts of the two datasets processed according to Section 5.1: COCO_40504.json and diffusionDB_5000.json.

Image Generation

For full precision model, running the following command. For quant models, please use q-diffusion to generate images.

python Image_generation/generate.py --model_name <choose model> --scheduler <choose scheduler> --gpu_id <GPU ID to use for CUDA> --seed <seed> --step <step> --save_dir <save_path>

Image Evaluation

The metrics considered in this paper are:

CLIP, which measures the text-image alignment
Aesthetic, which measures how good-looking an image is
ImageReward, which measures the human preference of an image
HPS, which measures the human preference of an image
FID, accelerating FID evaluation with a batch-based approach

For HPS, you need to download the pre-trained model first HPS; For other metrics, running the evaluation comment will automatically download the models.

For single-image evaluation metrics, running the following command:

python Evaluation/test_benchmark.py --config configs/evaluation.yaml

For FID, please download the ground truth images from COCO and calculate the score.

Preprocessing

Organize model scores and divide them into training models and testing models. All the processed data of One-time Generation are in all_metrics.

python preprocess/get_model_score.py --config configs/get_score.yaml

Search algorithm

python search_algorithm.py --config configs/search.yaml

Note: Since constructing random subsets involves randomness, the results may vary each time the process is run. As iteration increases, the effect gets better.

Contact

If you have any questions, please contact lllzz0309zz@gmail.com.

Citation

If you find our work useful, please cite:

@article{zhao2024flasheval,
  title={FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models},
  author={Zhao, Lin and Zhao, Tianchen and Lin, Zinan and Ning, Xuefei and Dai, Guohao and Yang, Huazhong and Wang, Yu},
  journal={arXiv preprint arXiv:2403.16379},
  year={2024}
}

thu-nics/FlashEval