"nn.TransformerEncoderLayer" is adopted to construct the "conditonal transformer" in your paper.
fido20160817 opened this issue · 0 comments
fido20160817 commented
Thanks for your great work.
I noticed that you utilize "nn.TransformerEncoderLayer" when constructing "conditional transformer". Since it is used to predict the next token index, I am wondering whether the decoder of transformer is more appropriate for the construction of your conditional transformer? or what's the reason that you don't adopt "nn.TransformerdecoderLayer" ?
Because of the structure of "nn.TransformerEncoderLayer" is simpler or more concise than that of "nn.TransformerDEcoderLayer" ?