question about "doubly stochastic attention"

Question

question about "doubly stochastic attention"

zym1010 opened this issue 8 years ago · 2 comments

As I'm reading the paper, I don't understand, for the soft attention version, why we encourage \sum_{t} a_{ti} \approx 1, as I feel C/L would be more appropriate, since \sum_{t,i} a_{ti} = C.

Answer 1 · 2017-02-23T19:58:28.000Z

Hi, we have a note of this in the paper. What you suggest is correct, for the results we reported though we used 1. This didn't really change the results in our experience.

Answer 2 · 2017-02-23T20:03:14.000Z

@kelvinxu thanks!