Is using test file name for inference a fair practice?

Question

Is using test file name for inference a fair practice?

Aradhye2002 opened this issue a year ago · 2 comments

Aradhye2002 commented a year ago

Isn't it wrong to use the name of the test image file for the inference process? Like suppose I named them img1.png, img2.png, ..., then the code would not work. Also you can't do inference with images whose class_id you don't know or even which doesn't fall into one of the class_ids.

Answer 1 · 2023-08-23T12:54:29.000Z

In the depth estimation task, we think introducing the category name of the scene is not unfair. The task focuses more on the low-level details while the provided category name is a high-level concept. In our VPD, we only use the category name to better exploit the pre-trained knowledge of the text-to-image diffusion model.

We can also run our model without the category name by using another simple network to predict the category of the scene (which is not difficult to learn), and the results should stay the same.

Answer 2 · 2023-08-30T08:34:20.000Z

Thanks for the reply!

Another doubt that I had was about taking the mean for the text embeddings for a given class with all the imagenet templates. What is the motivation for this? In the stable diffusion model originally we are to give a single text sentence right?