tuckerowens/trec-car-eval

bug in computing evaluation results across queries

laura-dietz opened this issue · 1 comments

It seems that when computing an eval measure across queries, you only average results for rankings that were given to you. But you are not penalizing the case where a query is not answered with a ranking.

Example:
The task was to rank elements for 100 queries
The system only retrieves a (non-empty) result for 20 of them.

That's not a good system, isn't it?

I simulated this case by dropping a bunch of queries from the test200 mock case (github made me rename *run to *txt)

test200-mock2.txt

I didn't even consider a system that bad. Thank you.