The evaluation results depend on batch size
apple2373 opened this issue · 1 comments
I noticed that the evaluation results are different depending on the batch size. I found the reason is this line.
https://github.com/yuyanli0831/OmniFusion/blob/aaf52cc953ade3be1f5fc3df446705e4223b8d21/test.py#L161
Because the median value changes if we group different images together, the results will be slightly different per batch size. If we want deterministic one, then we can either disable the median alignment or use the median per image.
Well, not just implementation, I wonder why we need median alignment. My best guess is that, if all objects go twice as much as far, and also become twice as much as big, the images look same? (Am I correct?! but i can imagine there's some scale ambiguity if we only have a single image.) And to address the scale ambiguity, we use median of ground truth and predicted images to align the scales of these images.