mental2008's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
tqdm/tqdm
:zap: A Fast, Extensible Progress Bar for Python and CLI
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
huggingface/candle
Minimalist ML framework for Rust
git-lfs/git-lfs
Git extension for versioning large files
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
huggingface/text-generation-inference
Large Language Model Text Generation Inference
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
facebookresearch/fairscale
PyTorch extensions for high performance and large scale training.
leptonai/leptonai
A Pythonic framework to simplify AI service building
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
jiqizhixin/Artificial-Intelligence-Terminology-Database
A comprehensive mapping database of English to Chinese technical vocabulary in the artificial intelligence domain
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
alibaba/havenask
ray-project/ray-llm
RayLLM - LLMs on Ray
Azure/AzurePublicDataset
Microsoft Azure Traces
sail-sg/lorahub
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Troyciv/anki-templates-superlist
A collection of Anki card styles
OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
nothingislost/obsidian-workspaces-plus
Quickly switch and manage Obsidian workspaces
eth-easl/orion
An interference-aware scheduler for fine-grained GPU sharing
Hsword/SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
ModelTC/awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model