/attention-stuff

Trying to understand how transformers work, by diving into attention and multi-head attention blocks.

Primary LanguagePython

No issues in this repository yet.