/spatten-llm

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Primary LanguageScalaMIT LicenseMIT

Issues