what is the meaning of attention_size?
Cumberbatch08 opened this issue · 4 comments
Cumberbatch08 commented
ilivans commented
Attention size is an inner size of attention layer, alphas
are just probabilities of tokens. Attention size allows to regulate 'capacity' of attention layer.
Cumberbatch08 commented
Thanks, does attention_size have any physical meaning? How to set this size?
ilivans commented
It's a linear size of layer weights similar to size of fully-connected layer or hidden size of RNN. The value should be chosen accordingly by means of grid search or something like that.
Cumberbatch08 commented
Thanks a lot. u_omgea represents context vector, so I think the attention_size should be set as hidden_size * 2.