llm-serving
There are 53 repositories under llm-serving topic.
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
bentoml/OpenLLM
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
bentoml/BentoML
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
skypilot-org/skypilot
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
SuperDuperDB/superduperdb
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
microsoft/aici
AICI: Prompts as (Wasm) Programs
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
ray-project/ray-llm
RayLLM - LLMs on Ray
mosecorg/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
hpcaitech/SwiftInfer
Efficient AI Inference & Serving
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
rohan-paul/LLM-FineTuning-Large-Language-Models
LLM (Large Language Model) FineTuning
ray-project/ray-educational-materials
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
helixml/helix
Multi-node production AI stack. Run the best of open source AI easily on your own servers. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server
substratusai/runbooks
Finetune LLMs on K8s by using Runbooks
chenhunghan/ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
slai-labs/get-beam
Run GPU inference and training jobs on serverless infrastructure that scales with you.
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
HPMLL/BurstGPT
A GPT-3.5 & GPT-4 Workload Trace to Optimize LLM Serving Systems
mani-kantap/llm-inference-solutions
A collection of all available inference solutions for the LLMs
sugarcane-ai/sugarcane-ai
npm like package ecosystem for Prompts 🤖
friendliai/friendli-client
Friendli: the fastest serving engine for generative AI
asprenger/ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
AntonioGr7/pratical-llms
A collection of hand on notebook for LLMs practitioner
OSS-Pole-Emploi/happy_vllm
A REST API for vLLM, production ready
ray-project/llms-in-prod-workshop-2023
Deploy and Scale LLM-based applications
oscinis-com/Awesome-LLM-Productization
Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
azminewasi/Awesome-LLMs-ICLR-24
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
Neural-Dragon-AI/Cynde
A Framework For Intelligence Farming
ray-project/anyscale-berkeley-ai-hackathon
Ray and Anyscale for UC Berkeley AI Hackathon!
ehsanghaffar/ein-llm
A self-hosted personal chatbot API with FastAPI. It allows you to interact with the Llama2 LLM (and other open-source LLMs) to have natural language conversations, generate text, and perform various language-related tasks.
mddunlap924/LLM-Inference-Serving
This repository demonstrates LLM execution on CPUs using packages like llamafile, emphasizing low-latency, high-throughput, and cost-effective benefits for inference and serving.
IvanLuLyf/bunny-llm
Deno LLM API Service