Use of spatial dropout
jlmaccal opened this issue · 2 comments
jlmaccal commented
Could you clarify if spatial dropout is used? The paper suggests that it is, but the code seems to use standard dropout.
jerrybai1995 commented
Hi @jlmaccal,
We originally did use the spatial dropout (which is simply nn.Dropout1d
in pytorch), but then later it was pointed out to us that the standard dropout works better on language modeling tasks. Therefore the repo has been changed to use the standard dropout.
jlmaccal commented
Thanks for the clarification.