Confused about the topic attention in the original paper
Closed this issue · 2 comments
Helicqin commented
Sorry to bother you with this. I have read your great paper, but some confusion about the topic attention.
In the paper, you said:
The topic words {t1,t2,...,tn} are then linearly combined to form a fixed-length vector k. The weight values are calculated as the following:
I hardly figure it out. Is it the same as normal query-key-value attention? In my opinion, the final context-level encoder hidden state serves as a query, the word embedding of topic words serve as values. But how are the weights \beta calculated?
Look forward to your reply! Thanks.
ehsk commented
Helicqin commented
Thanks for your reply!