Implementation in PyTorch of http://proceedings.mlr.press/v48/martins16.pdf (International Conference on Machine Learning 2016)
It consists in an activation function similar than Softmax but can give us sparse probabilities of inputs. Interesting to use for attentional models.
I tested replacing the softmax activation in the last layer and it gives similar results.
Coded by Max Raphael Sobroza Marques