ilivans/tf-rnn-attention

what is the meaning of attention_size?

Cumberbatch08 opened this issue · 4 comments

In attention part, the attention_size is a Hyperparameter, when we calculate the alpha, the shape of alpha is not about of attention_size. So, the attention_size is to do what?
image

Attention size is an inner size of attention layer, alphas are just probabilities of tokens. Attention size allows to regulate 'capacity' of attention layer.

Thanks, does attention_size have any physical meaning? How to set this size?

It's a linear size of layer weights similar to size of fully-connected layer or hidden size of RNN. The value should be chosen accordingly by means of grid search or something like that.

Thanks a lot. u_omgea represents context vector, so I think the attention_size should be set as hidden_size * 2.