amritasaha1812/CSQA_Code

`kvmem` vs `decoder` loss?

Closed this issue · 1 comments

What is the difference between kvmem loss and decoder loss? How do you decide which to use?

both decoder and kvmem losses are required for the model to train
Decoder loss: This loss will make the model output correct tokens in natural language (reduce the cross-entropy between the predicted and target vocab. prob. distribution for each step in decoding.

kvmem loss: This loss is used to train the key-value memory network. The model should learn to give the weights to different locations in memory values similar to the gold target entities (which are input to the model in training mode). We minimize the cross-entropy here.