Tutorial 6: Question to CrossEntropyLoss
bpaulwitz opened this issue · 1 comments
bpaulwitz commented
Hey, thanks for your great tutorial on transformers!
In Code Section [24] you have written the following:
output = output.contiguous().view(-1, output_dim)
trg = trg[:,1:].contiguous().view(-1)
#output = [batch size * trg len - 1, output dim]
#trg = [batch size * trg len - 1]
loss = criterion(output, trg)
which does not work for me. I don't know if the torch API to nn.CrossEntropyLoss has changed or not, but for me it expects for the target the shape [batch size, output dim].
I think this is weird, becouse how can the loss of the whole transformer be calculated if it only checks the last bit? Should i use another loss function now? It still seems to be the go-to-function in NLP wherever i look.
bpaulwitz commented
nvm, i'm an idiot xD