Question: why this choice of BERT and GPT2?

Question

Question: why this choice of BERT and GPT2?

alxthm opened this issue 3 years ago · 2 comments

Hi,
Thank you for this work and for releasing the code as well ! 🎉
I was wondering if there was any reason you chose to use BERT as an encoder and GPT2 as a decoder, instead of other pretrained language models ? In particular, why not considering models that already have an encoder/decoder architecture, such as T5 or BART ?
Thanks

Answer 1 · 2021-07-10T04:10:12.000Z

Historically, T5/BART was not publicly released yet, when we started the Optimus project.
Though all these models have an encoder-decoder architecture, Optimus aims to learn a compact sentence-level representation, thus only a single vector representation is used to link encoder and decoder; Otherwise, T5/BART should be considered for dense representations.

Answer 2 · 2021-07-16T17:29:34.000Z

This makes sense, thanks :)