/block-recurrent-transformer

Pytorch implementation of "Block Recurrent Transformers" (Hutchins & Schlag et al., 2022)

Primary LanguagePythonMIT LicenseMIT

Block Recurrent Transformer

A PyTorch implementation of Hutchins & Schlag et al.. Owes very much to Phil Wang's x-transformers. Very much in-progress.

Dockerfile, requirements.txt, and environment.yaml because I love chaos.

Differences from the Paper (as of 2022/05/04)

  • Keys and values are not shared between the "vertical" and "horizontal" directions (the standard input -> output information flow and the recurrent state flow, respectively).
  • The state vectors are augmented with Rotary Embeddings for positional encoding, instead of using learned embeddings.
  • The special LSTM gate initialization is not yet implemented.