/DPFP-pytorch

Implementation of Deterministic Parameter-Free Projection (DPFP) from the paper "Linear Transformers Are Secretly Fast Weight Memory Systems"

Primary LanguageCudaBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Stargazers