Reduce memory requirements for LSTMs
Closed this issue · 0 comments
Memory requirements are excessive for the LSTMs, especially during the backward/backpropagation pass.
Current requirements (aka workspace size):
- Inference: 4 * 6 * H * W * N * D;
- Training: 2 * 4 * 6 * H * W * N * D;
I'm working on these improvements:
- As NVIDIA's cudnn does, separate the workspace between work and reserve spaces: the work space is a temporal space that can be shared across multiple layers, and is not needed after the forward/backward operation. The reserve space is needed to store data from the forward pass (activations) in order to perform the backward pass efficiently.
- The reserve space is not needed after the backward pass anymore, so we can easily reduce the memory by a factor of two.
During inference, there is no need of reserve space, only work space. So that memory can be heavily reused by consecutive layers. Additionally, there is no need to keep a size of 4 * 6 * H * W * N * D, but only 4 * 6 * 2 * min(H, W) * D, since only the information of two diagonals need to be accessed at a given time (current and previous), and no information needs to be stored for the backward pass. However, this memory optimization may have some impact in the speed.
I'm working on these changes in the less_mem branch. The forward pass is already working both in training and inference mode.