facebookresearch/EmpatheticDialogues

The retrieval-based method

Closed this issue · 5 comments

Hi, I have several questions regarding retrieval-based model

1. How do you get 100 candidates at inference time in calculating P@1, 100
2. At training time, you use all of the utterances from the batch as candidates to minimize the negative log-likelihood of selecting the correct candidate. Why not sample negative examples of a certain proportion. For example, sample 9 negative examples for one positive example. Did you compare these two methods?

Looking forward to your reply.
Best wishes

Hi there!

  1. We split the candidates into groups of 100 at https://github.com/facebookresearch/EmpatheticDialogues/blob/master/retrieval_train.py#L137
  2. Do you mean using 9 negative examples instead of 511 negative examples for a batch size of 512? Generally I've found that larger candidate batches work better for training retrieval models. One intuition for this is that it's a harder job to select the right answer among 100 candidates vs. 10 candidates, and so the model learns more information about how to pick the right candidate if it has to pick among a larger pool.

Hi there!

  1. We split the candidates into groups of 100 at https://github.com/facebookresearch/EmpatheticDialogues/blob/master/retrieval_train.py#L137
  2. Do you mean using 9 negative examples instead of 511 negative examples for a batch size of 512? Generally I've found that larger candidate batches work better for training retrieval models. One intuition for this is that it's a harder job to select the right answer among 100 candidates vs. 10 candidates, and so the model learns more information about how to pick the right candidate if it has to pick among a larger pool.

Hi, I think your method is likely to be a pointwise-mothod not pairwise-method. is it? your method is besed on representation of contexts and responses. If I want to use interaction-based method, for example, concatenate context and response and get a possibility, then I couldn't get [batch, batch] dot_products matrix, so I guess that you use representation-based method not interaction-based method to train due to this reason, I don't know whether my thought is right. Thanks.

Oh, I think I see what you're saying - yes, we're using a biencoder architecture, which considers the [batch, batch] dot-products matrix of contexts and responses. No, we haven't tried a cross-encoder architecture, which would concatenate the contexts and responses together. I imagine that that architecture would likely perform a bit better, at the expense of slower training speed.

Thank you for your reply : )

No problem!