An implementation of local windowed attention for language modeling
Primary LanguagePythonMIT LicenseMIT