jcjohnson/densecap

What is the ground truth when I use natural language queries to retrieve the source image?

helloworldwxr opened this issue · 0 comments

In your paper, your dense captioning model can support image retrieval using natural language queries, and can localize these queries in retrieved images. What the ground truth when you calculate R@n?