bentrevett/pytorch-seq2seq

Tutorial 6: Question to CrossEntropyLoss

bpaulwitz opened this issue · 1 comments

Hey, thanks for your great tutorial on transformers!

In Code Section [24] you have written the following:

output = output.contiguous().view(-1, output_dim)
trg = trg[:,1:].contiguous().view(-1)
        
#output = [batch size * trg len - 1, output dim]
#trg = [batch size * trg len - 1]
    
loss = criterion(output, trg)

which does not work for me. I don't know if the torch API to nn.CrossEntropyLoss has changed or not, but for me it expects for the target the shape [batch size, output dim].
I think this is weird, becouse how can the loss of the whole transformer be calculated if it only checks the last bit? Should i use another loss function now? It still seems to be the go-to-function in NLP wherever i look.

nvm, i'm an idiot xD