About which transformers architecture was used for generative step?
Closed this issue · 1 comments
17521121 commented
I saw in paper said that you use the full Transformer architecture (Vaswani et al.,
2017), but you know transformers had many architectures right now, GPT2, Bert, Robert, ... and each architecture has its own tasks.
EricMichaelSmith commented
Yes - for generation we used the original Transformer architecture from Vaswani et al, as opposed to those other newer architectures.