Reproduce SD (& other models') results
zwcolin opened this issue · 2 comments
Hello,
I was wondering if there's any way to reproduce your results from your main table? I didn't find any information about seed/generator usage from your codebase/report so I'd really appreciate if you could provide some insights on reproducing these numbers. Thanks!
Also, a side question, for all SD that is mentioned from the codebase and the report, are you using stable diffusion 1.4 or 1.5?
Hi,
we directly used the open-source versions of the models that we have tested in our benchmarking. We used SD 1.4 and the default parameters from https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py
Note that we have provided the outputs of our object detector in ./objdet_results
. All of our generated images are available here: https://huggingface.co/datasets/tgokhale/sr2d_visor/tree/main
Happy to provide more information if needed.
Thanks for the response and the additional information!
Another question that we had is how accurate is the detector for ground truth images? i.e., by setting the threshold to 0.1, do you provide a VISOR reference to ground truth image-prompt (or image-caption) pairs, such as all captions relating to spatial relationships from the COCO dataset?