Evaluation Metrics
Closed this issue · 1 comments
wh0330 commented
Hi, thanks for your sharing.
But, I'm a little confused. Do you evaluate with only 20 candidate answers?
agakshat commented
Hi @wh0330
No, for the quantitative metrics we used the entire validation set of Visual Dialog v0.9 dataset (~40k images), while for the quantitative human evaluation we used ~100 samples. Any part of the code using only 20 answers for evaluation is probably a product of last-minute debugging being done.