serving-infrastructure

There are 1 repositories under serving-infrastructure topic.

  • ksm26/Efficiently-Serving-LLMs

    Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

    Language:Jupyter Notebook9103