[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Primary LanguageScalaMIT LicenseMIT