/FineInfer

Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)

Primary LanguagePythonMIT LicenseMIT

No issues in this repository yet.