ChunyuanLI/Optimus

Question: why this choice of BERT and GPT2?

alxthm opened this issue ยท 2 comments

Hi,
Thank you for this work and for releasing the code as well ! ๐ŸŽ‰
I was wondering if there was any reason you chose to use BERT as an encoder and GPT2 as a decoder, instead of other pretrained language models ? In particular, why not considering models that already have an encoder/decoder architecture, such as T5 or BART ?
Thanks

  1. Historically, T5/BART was not publicly released yet, when we started the Optimus project.
  2. Though all these models have an encoder-decoder architecture, Optimus aims to learn a compact sentence-level representation, thus only a single vector representation is used to link encoder and decoder; Otherwise, T5/BART should be considered for dense representations.

This makes sense, thanks :)