About model naming

Question

helson73 opened this issue 8 years ago · 1 comments

Hi,
Thanks for releasing dl4mt-cdec !
I have two questions:

I am wondering if suffix "_both" stands for attention mechanism is applied to both hidden layers?
In bi-scale model, does suffix "_attc" mean the attention mechanism is used to only first hidden layer(slower layer) ?
Thanks again.

Answer 1 · 2016-05-11T15:49:41.000Z

Hi,

Both hidden layers in decoder were used to compute the alignment weights. Context (Eq. 2) is always used to update both hidden layers.
Only slower layer was used to compute the alignment weights.