llm-inference

There are 690 repositories under llm-inference topic.

nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Language:C++71.5k 647 2k7.8k
microsoft/autogen
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Language:Jupyter Notebook36.6k 418 2.2k5.3k
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）
Language:HTML12.3k 102 231.3k
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Language:Python11k 99 8171.1k
bentoml/OpenLLM
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
Language:Python10.3k 56 270654
mistralai/mistral-inference
Official inference library for Mistral models
Language:Jupyter Notebook9.8k 127 147871
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Language:C++8k 78 172418
openvinotoolkit/openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Language:C++7.5k 196 2.7k2.4k
bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Language:Python7.2k 76 1.1k798
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python5k 40 1.6k451
superduper-io/superduper
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
Language:Python4.9k 44 1.3k470
kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
Language:Python3.7k 67 1.9k1.1k
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
3.1k 105 6208
neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
Language:Python3.1k 58 141175
NVIDIA/GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Language:Python2.6k 60 55576
databricks/dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
Language:Python2.5k 42 24239
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language:Jupyter Notebook2.4k 31 92164
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.3k 33 253149
intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Language:Python2.1k 28 166210
microsoft/aici
AICI: Prompts as (Wasm) Programs
Language:Rust2k 24 7578
liltom-eth/llama2-webui
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
Language:Jupyter Notebook2k 24 46203
codelion/optillm
Optimizing inference proxy for LLMs
Language:Python1.8k 23 50146
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.7k 21 153164
dstackai/dstack
dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
Language:Python1.6k 12 1.1k163
b4rtaz/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
Language:C++1.6k 32 65111
ray-project/ray-llm
RayLLM - LLMs on Ray
Language:Python1.2k 19 8993
katanemo/archgw
Arch is an intelligent gateway for agents. Engineered with (fast) LLMs for the secure handling, rich observability, and seamless integration of prompts with your APIs - outside business logic. Built by the core contributors of Envoy proxy, on Envoy.
Language:Rust1.1k 11 8449
PySpur-Dev/pyspur
Graph-Based Editor for LLM Workflows
Language:TypeScript1k73
lean-dojo/LeanCopilot
LLMs as Copilots for Theorem Proving in Lean
Language:C++1k 15 4894
character-ai/prompt-poet
Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
Language:Python983 6 882
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
Language:Python885 13 15491
stoyan-stoyanov/llmflows
LLMFlows - Simple, Explicit and Transparent LLM Apps
Language:Python674 12 3935
ghimiresunil/LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.
Language:Jupyter Notebook641 17 0113
beam-cloud/beta9
Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world
Language:Go622 4 735
mukel/llama3.java
Practical Llama 3 inference in Java
Language:Java615 27 1672
felladrin/awesome-ai-web-search
A list of software that allows searching the web with the assistance of AI.
Language:HTML605 9 1034

llm-inference

nomic-ai/gpt4all

microsoft/autogen

liguodongiot/llm-action

Lightning-AI/litgpt

bentoml/OpenLLM

mistralai/mistral-inference

SJTU-IPADS/PowerInfer

openvinotoolkit/openvino

bentoml/BentoML

InternLM/lmdeploy

superduper-io/superduper

kserve/kserve

DefTruth/Awesome-LLM-Inference

neuralmagic/deepsparse

NVIDIA/GenerativeAIExamples

databricks/dbrx

FasterDecoding/Medusa

predibase/lorax

intel/intel-extension-for-transformers

microsoft/aici

liltom-eth/llama2-webui

codelion/optillm

flashinfer-ai/flashinfer

dstackai/dstack

b4rtaz/distributed-llama

ray-project/ray-llm

katanemo/archgw

PySpur-Dev/pyspur

lean-dojo/LeanCopilot

character-ai/prompt-poet

SafeAILab/EAGLE

stoyan-stoyanov/llmflows

ghimiresunil/LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing

beam-cloud/beta9

mukel/llama3.java

felladrin/awesome-ai-web-search