huggingface/open-muse

Relu Squared

isamu-isozaki opened this issue · 0 comments

https://arxiv.org/abs/2109.08668
Claims to perform better than Gelu in an autoregressive language setting. The idea is just doing relu and then squaring.