word_language_model, is it a Transformer, Encoder-only or Decoder only?

Question

word_language_model, is it a Transformer, Encoder-only or Decoder only?

efg001 opened this issue 4 months ago · 1 comments

📚 Documentation

The document says word_language_model uses RNN/Transformer but I am having trouble understanding exactly what it is.

Looking at the input target sequences, seems like it is a generative model where the expected output is shifted by 1(i.e the model is trained to generate words base on a prefix)
https://github.com/pytorch/examples/blob/main/word_language_model/main.py#L140

However, I see the output of decoder is re-wired as the input to encoder here:
https://github.com/pytorch/examples/blob/main/word_language_model/model.py#L143

As a reference, since the document says that word_language_model implement both a RNN and a transformer model, I looked pytorch's implementation of transformer here:
https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/transformer.py#L273-L279
pytorch's implementation aligns with what the paper proposed where the input to decoder is src(input sequence) and input to. decoder is tgt(shifted target sequence)

So obviously word_language_model is not a vanilla transformer-like model for generating text because of the rewiring.
Since it uses the vanilla transformer model and the built in cross attention in decoder is not removed, it is not a decoder-only model either.
And since it is trained to generate text, I dont think it can be understood as a decoder-only model.

Can someone help me understand why the output of encoder is re-wired to decoder as input to decoder instead of through cross attention and if the doc needs to be updated to reflect what the model is doing or the code needs to be simplified to use a decoder-only model?

Answer 1 · 2024-07-20T05:40:53.000Z

nvm its a decoder only model

encoder is decoder
self.decoder = nn.Linear(nhid, ntoken)