kelvinxu/arctic-captions

question about "doubly stochastic attention"

zym1010 opened this issue · 2 comments

As I'm reading the paper, I don't understand, for the soft attention version, why we encourage \sum_{t} a_{ti} \approx 1, as I feel C/L would be more appropriate, since \sum_{t,i} a_{ti} = C.

Hi, we have a note of this in the paper. What you suggest is correct, for the results we reported though we used 1. This didn't really change the results in our experience.

@kelvinxu thanks!