This experiment tests the capability of combining invertible neural networks (iRevNet, Reversible ResNet) and the universal transformer. The idea is to get a memory-efficient backpropagation of the UT allowing us to train it on smaller GPUs.
- First version implementation
- Translation from UT parameters to invertible UT parameters (taking half of hidden size)
- Verify current implementation
- Check parameter setting for attention (might also reduce channel size)
- Adding parameter for deciding whether to share the layers or not