Query about additional column in valid and test

Question

Query about additional column in valid and test

Closed this issue 4 years ago · 4 comments

Hello, I was going through the dataset and I noticed that the valid and test sets have an additional column containing candidate responses. I don't see this additional column being referenced in the dataloader.

The original paper mentions that the candidates were sampled from three sources - the training set, dailydialogs and reddit conversations and the validation code seems to do exactly that.

I'm not sure if I'm misreading the code or if this repo's evaluation code does not represent the current state of how training and evaluation is done for this dataset. Can I get some clarity on this? Thanks in advance for your help!

Answer 1 · 2020-04-20T20:04:04.000Z

Hi! Yes, each response in the validation set contains 100 candidate responses drawn randomly from the validation set (including the gold response), and similarly for the test set.

Answer 2 · 2021-05-26T12:44:38.000Z

Hi @EricMichaelSmith, Are we supposed to use these provided candidates to evaluate and match the P@1,100 given in the paper?

Thanks and regards,
Kunal Pagarey

Answer 3 · 2021-05-26T16:52:14.000Z

Yes @kunalpagarey - as far as I remember, those should be the candidates used to match the paper numbers.

Answer 4 · 2021-05-26T17:10:58.000Z

@EricMichaelSmith You really reply quickly thank you so much 😀