HFAiLab/clip-gen

"nn.TransformerEncoderLayer" is adopted to construct the "conditonal transformer" in your paper.

fido20160817 opened this issue · 0 comments

Thanks for your great work.

I noticed that you utilize "nn.TransformerEncoderLayer" when constructing "conditional transformer". Since it is used to predict the next token index, I am wondering whether the decoder of transformer is more appropriate for the construction of your conditional transformer? or what's the reason that you don't adopt "nn.TransformerdecoderLayer" ?

Because of the structure of "nn.TransformerEncoderLayer" is simpler or more concise than that of "nn.TransformerDEcoderLayer" ?