lihuahua123

lihuahua123's Stars

lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Language:Python36.7k 349 1.8k4.5k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.8k 115 1.1k1.3k
chenzomi12/AISystem
AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Language:Jupyter Notebook10.9k 149 371.6k
bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Language:Python9.2k 92 201515
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.1k 111 81540
OdysseusYuan/LKY_OfficeTools
一键自动化下载、安装、激活 Office 的利器。
Language:C#8.6k 58 51783
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Language:C++7.9k 77 163410
ubicloud/ubicloud
Open source alternative to AWS. Elastic compute, block storage (non replicated), firewall and load balancer, managed Postgres, and IAM services in public beta.
Language:Ruby3.5k 30 97108
OpenWebGAL/WebGAL
A brand new web Visual Novel engine | 全新的网页端视觉小说引擎
Language:TypeScript2.7k 22 175245
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
Language:C++1.7k 32 657224
b4rtaz/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
Language:C++1.4k 30 5699
ray-project/ray-llm
RayLLM - LLMs on Ray
Language:Python1.2k 20 8993
hao-ai-lab/LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Language:Python1.1k 11 5567
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
Language:Python799 12 13180
Azure/AzurePublicDataset
Microsoft Azure Traces
Language:Jupyter Notebook794 37 35141
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Language:Python570 8 2323
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Language:Python346 8 941
UbiquitousLearning/Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
211 5 212
lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
Language:Python201 8 316
HPMLL/BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Language:Python117 6 37
eth-easl/orion
An interference-aware scheduler for fine-grained GPU sharing
Language:Python99 2 1715
Hsword/SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
93 2 58
tyler-griggs/melange-release
Language:Python35 2 03
icloud-ecnu/igniter
iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud.
Language:Python34 2 16
chenhongyu2048/LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
29 3 01
tonyzhao-jt/LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
Language:Jupyter Notebook25 2 01
alpha-unito/Model-Agnostic-FL
Extension to the OpenFL framework for non gradient descent learning
Language:Python3 8 02
Robyroc/Legio
Library to introduce fault-tolerance in MPI in the form of graceful degradation
Language:C2 0 01
cfl2005/ParaTra
Language:Python1 1 00
LedgeDash/unum-paper
Language:TeX11