Enquiry about the implementation of the models

Hi, thanks for your work.

It seems that your in your implementation, there is only one BERT in the bi-encoder or poly-encoder.
In that,

Line 99 in 753e736

if isinstance(self.bert, DistilBertModel):

Line 113 in 753e736

if isinstance(self.bert, DistilBertModel):

the context and response use the same BERT model self.bert() to encoding the text and then forward the outputs to separate linear layers.

However, the author of poly-encoder states that there should be two Transforms in the bi-encoder architecture. I assume this means that we might need 2 BERTs for context & response encoding.

Apparently your implementation with one BERT is more lightweight and should converge faster than the original idea. Do you think there are any possible down sides?

Hi,
Thanks for reaching out, it's really a good question.

In my experience, using two models for context and response separately may not have a great impact on the results, many papers also support this view (e.g. https://arxiv.org/abs/1911.03688).
Since I think no matter which approach is used, it will not affect the conclusion about poly-encoder, so I choose the faster one.

If you are interested, maybe you can simply modify the current code to have a comparison about these two approaches on this task. It would be greatly appreciated!

Thanks.

Thanks for your reply.

using two models for context and response separately may not have a great impact on the results

I think this may vary depending on different tasks. This issue should be closed.

I think it's not hard to run a "real" 2-BERT architecture with your repo. I might have a PR for this in the future if time permitted. （下次一定）