
[NeurIPS 2023] Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks

  • Under refactoring.


This Git repository provides the Softmax output approximation function, which is an open-source code of NeurIPS 2023 paper titled "Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks". (Paper link)

This repository provides the proposed softmax output approximation function in Python/Pytorch, and demonstrates an example of a machine translation task using the Transformer and Multi30k dataset, as experimented in the paper.

Software Install and Code Cloning

The Approximation function is implemented based on Python and Pytorch with a GPU.

Step 1. Install Python (>= 3.8).

Step 2. Install Pytorch >= 1.12.1).

Step 3. Clone this Softmax output approximation repository.

> git clone https://github.com/eai-lab/SoftmaxOutputApproximation.git
How to use the Softmax output approximation function

Step 1. Import our fucntion along with the user-specific hyperparameter m

from approximation_method import *

Step 2. Decide how many elements to select, considering the total length of the sentence in transformer_train.py.

Step 3. Replace the softmax function used in the attention mechanism with our softmax output approximation function.

attention = softmax_approximation.apply(energy, mask)

