agakshat/visualdialog-pytorch

Evaluation Metrics

Closed this issue · 1 comments

Hi, thanks for your sharing.
But, I'm a little confused. Do you evaluate with only 20 candidate answers?

Hi @wh0330

No, for the quantitative metrics we used the entire validation set of Visual Dialog v0.9 dataset (~40k images), while for the quantitative human evaluation we used ~100 samples. Any part of the code using only 20 answers for evaluation is probably a product of last-minute debugging being done.