What is the ground truth when I use natural language queries to retrieve the source image？

Question

What is the ground truth when I use natural language queries to retrieve the source image？

helloworldwxr opened this issue 8 years ago · 0 comments

In your paper, your dense captioning model can support image retrieval using natural language queries, and can localize these queries in retrieved images. What the ground truth when you calculate R@n?