This repository provides the implementation of minGRU, as described in:
Feng, L., Tung, F., Ahmed, M. O., Bengio, Y., & Hajimirsadegh, H. (2024). Were RNNs All We Needed?
arXiv preprint arXiv:2410.01201
The described minimal GRU variant with simplified gates and log-space computations for efficiency. I only implemented the log space variant.
A wrapper that enables minGRU
to function like PyTorch's GRU, including:
- Multi-layer support.
- Bidirectional processing.
- Compatibility with standard RNN pipelines.
This is mostly for comparison with standard models.
A modular block combining:
- Residual connections.
- Two-strand processing.
- Convolutions and normalization.
For my application, this yielded the best results.
A stack of MambaModule
layers for deep sequence modeling, with options for dimension projection.