nyu-dl/dl4mt-cdec

About model naming

helson73 opened this issue · 1 comments

Hi,
Thanks for releasing dl4mt-cdec !
I have two questions:

  1. I am wondering if suffix "_both" stands for attention mechanism is applied to both hidden layers?
  2. In bi-scale model, does suffix "_attc" mean the attention mechanism is used to only first hidden layer(slower layer) ?
    Thanks again.
jych commented

Hi,

  1. Both hidden layers in decoder were used to compute the alignment weights. Context (Eq. 2) is always used to update both hidden layers.
  2. Only slower layer was used to compute the alignment weights.