About model naming
helson73 opened this issue · 1 comments
helson73 commented
Hi,
Thanks for releasing dl4mt-cdec !
I have two questions:
- I am wondering if suffix "_both" stands for attention mechanism is applied to both hidden layers?
- In bi-scale model, does suffix "_attc" mean the attention mechanism is used to only first hidden layer(slower layer) ?
Thanks again.
jych commented
Hi,
- Both hidden layers in decoder were used to compute the alignment weights. Context (Eq. 2) is always used to update both hidden layers.
- Only slower layer was used to compute the alignment weights.