/adaptive_attention_span

Implementation of the "Adaptive Attention Span in Transformers" paper.

Primary LanguageJupyter Notebook

Stargazers