markdtw/soft-attention-image-captioning

Purpose of mask_ph

Closed this issue · 3 comments

Thank you for your code which is really helpful for my learning.
In the function that calculates the cross-entropy loss over a candidate sentence, what is the purpose of the variable "mask_ph"? The loss tensor is raised to the power of mask_ph, and afterwards divided by the sum of mask_ph.

Best wishes
James

each element in the mask variable is either 1 or 0 depending on the ground truth sentence length.
The function finalCaptions() in utils.py computes the sentence length, producing a mask of 1s equal to the length of the sentence and zeros until the mask sentence length.

hi, sorry for the super late reply

The reason to do masking is to exclude the trailing captions. If we didn't mask out the redundant captions, the loss would include these information which basically is the wrong loss calculation.

AFAIK This is a common approach to deal with the inability of variable length output with LSTM.

Thanks!

Hi Mark,
That makes perfect sense! Thanks. I'm going to check out training a model today
Best wishes!