Xiang-cd/unet-finetune

quantitative evaluation

Closed this issue · 4 comments

Hi, I'm impressed by your work.

I have two questions.

  1. I wonder how to evaluate the clip similarity in Fig.2 and FID score on the flower dataset.

  2. In Fig. 1(b), the measured time cost demonstrates that yours is faster than Dreambooth.
    But several visual adapter papers show their computation costs slightly increase due to the added adapter module.
    So, I wonder how the time cost is reduced in your work.

Thanks in advance.

hi, thank you for issuing,

  1. for clip score computation, we just use clip model to encode image feature, then norm the len of feature to 1, then compute the cosine value of these two feature vectors. We compute the mean cosine value of two sets of images(n1 image in set1, n2 images in set2, there will be n1*n2 image pairs)
  2. for fid score computation, we first resize the flower dataset to 256 for better alignment, then sample 5000 flowers in 256 resolution with all flower names(102 flower name in total), than compute the fid between 5000 flowers and resized dataset.
  3. we measure the time cost of the training process, as we only optimize a very small of parameters, the optimizer.step() will be much faster than optimizing the whole model, so there will be a speed up. For inference time, there will be no speed up in threoy.

@Xiang-cd

Thanks for your quick reply.

One can guess that the inference speed may be slightly slower than the Dreambooth baseline due to adapter computation as other adapter papers.

Do you have plans to release the evaluation code (e.g., clip score, fid)?

It would be a big help for reproducing the results if you share that codes.

Thanks in advance.

I may push code later on a dev branch

see tools dir, clip code is in the utils.py