thunlp/OpenMatch

How is the ANCE FirstP training data generated?

xyz8 opened this issue · 1 comments

xyz8 commented

How is the ANCE FirstP training data (bids_marco-doc_ance-maxp-10.tsv) generated?

zkt12 commented

Hi,

We followed the instructions in https://github.com/thunlp/OpenMatch/blob/master/retrievers/openmatch_ance_retriver_readme.md, loaded the checkpoint of TREC DL document firstP/maxP to encode the query and document, and inference the top-k documents similar to the query.

For each query-doc manual label, we randomly selected 10 docs from the top-k subset as negatives.

Kaitao