nouhadziri/THRED

Confused about the topic attention in the original paper

Closed this issue · 2 comments

Sorry to bother you with this. I have read your great paper, but some confusion about the topic attention.

In the paper, you said:

The topic words {t1,t2,...,tn} are then linearly combined to form a fixed-length vector k. The weight values are calculated as the following:
捕获

I hardly figure it out. Is it the same as normal query-key-value attention? In my opinion, the final context-level encoder hidden state serves as a query, the word embedding of topic words serve as values. But how are the weights \beta calculated?

Look forward to your reply! Thanks.

ehsk commented

Thanks for your interest in our work.
Your guess is correct. It is a regular attention and the equation for learning the weights is as follows:

image

We have made a few changes (including this one) in the paper, which will be published shortly.

Thanks for your reply!