Merge relevant docs when generating Dev dataset
jingtaozhan opened this issue · 1 comments
jingtaozhan commented
I'm having trouble understanding following codes from convert_msmarco_to_duobert_tfrecord.py.
qrels = None
if set_name != 'test':
qrels = load_qrels(path=qrels_path)
queries = load_queries(queries_path)
run = load_run(path=run_path)
data = merge(qrels=qrels, run=run, queries=queries)
When tfrecord for dev set is generated, relevant docs are added to the data together with the rank list output from the MonoBERT. I'm confused about this. So does it mean that the result for the dev set isn't comparable with the test set result?
jingtaozhan commented
Oh, I get it. It is to generate the label, not to be one of the candidates.