Question: why this choice of BERT and GPT2?
alxthm opened this issue ยท 2 comments
alxthm commented
Hi,
Thank you for this work and for releasing the code as well ! ๐
I was wondering if there was any reason you chose to use BERT as an encoder and GPT2 as a decoder, instead of other pretrained language models ? In particular, why not considering models that already have an encoder/decoder architecture, such as T5 or BART ?
Thanks
ChunyuanLI commented
- Historically, T5/BART was not publicly released yet, when we started the Optimus project.
- Though all these models have an encoder-decoder architecture, Optimus aims to learn a compact sentence-level representation, thus only a single vector representation is used to link encoder and decoder; Otherwise, T5/BART should be considered for dense representations.
alxthm commented
This makes sense, thanks :)