/CQIL

[ACL 2024] "CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers" by Longwei Zou, Qingyang Wang, Han Zhao, Jiangang Kong, Yi Yang, Yangdong Deng

Primary LanguagePython

Watchers