Confused about the topic attention in the original paper

Question

Confused about the topic attention in the original paper

Closed this issue 5 years ago · 2 comments

Sorry to bother you with this. I have read your great paper, but some confusion about the topic attention.

In the paper, you said:

The topic words {t1,t2,...,tn} are then linearly combined to form a fixed-length vector k. The weight values are calculated as the following:

I hardly figure it out. Is it the same as normal query-key-value attention? In my opinion, the final context-level encoder hidden state serves as a query, the word embedding of topic words serve as values. But how are the weights \beta calculated?

Look forward to your reply! Thanks.

Answer 1 · 2019-06-03T19:32:03.000Z

Thanks for your interest in our work.
Your guess is correct. It is a regular attention and the equation for learning the weights is as follows:

We have made a few changes (including this one) in the paper, which will be published shortly.

Answer 2 · 2019-06-04T01:57:21.000Z

Thanks for your reply!