HLTCHKUST/PAML

Applying F.log_softmax on Generator otuput

Closed this issue · 2 comments

Hi

You have applied F.log_sofmax on the output of projection layer in [line 232] (

return F.log_softmax(logit,dim=-1)
).

If we use nn.CrossEntropy for the loss function, the result of F.log_softmax enters in the loss function as in [line 333] (

loss = self.criterion(logit.contiguous().view(-1, logit.size(-1)), dec_batch.contiguous().view(-1))
)

So basically the output of the projection layer goes through F.log_softmax and then nn.CrossEntropy.

However, if you look at here, simply applying nn.CrossEntropy would automatically apply F.log_softmax so I think you should exlude the line 232 and instead just return [line 218] (

logit = self.proj(x)
).

What do you think ?

indeed we use self.criterion = nn.NLLLoss(ignore_index=config.PAD_idx) instead of nn.CrossEntropy

you can also use logits + CrossEntropy