Relu Squared
isamu-isozaki opened this issue · 0 comments
isamu-isozaki commented
https://arxiv.org/abs/2109.08668
Claims to perform better than Gelu in an autoregressive language setting. The idea is just doing relu and then squaring.
isamu-isozaki opened this issue · 0 comments
https://arxiv.org/abs/2109.08668
Claims to perform better than Gelu in an autoregressive language setting. The idea is just doing relu and then squaring.