lihuahua123's Stars
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
OdysseusYuan/LKY_OfficeTools
一键自动化 下载、安装、激活 Office 的利器。
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
ubicloud/ubicloud
Open source alternative to AWS. Elastic compute, block storage (non replicated), firewall and load balancer, managed Postgres, and IAM services in public beta.
OpenWebGAL/WebGAL
A brand new web Visual Novel engine | 全新的网页端视觉小说引擎
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
b4rtaz/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
ray-project/ray-llm
RayLLM - LLMs on Ray
hao-ai-lab/LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
Azure/AzurePublicDataset
Microsoft Azure Traces
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
UbiquitousLearning/Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
HPMLL/BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
eth-easl/orion
An interference-aware scheduler for fine-grained GPU sharing
Hsword/SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
tyler-griggs/melange-release
icloud-ecnu/igniter
iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud.
chenhongyu2048/LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
tonyzhao-jt/LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
alpha-unito/Model-Agnostic-FL
Extension to the OpenFL framework for non gradient descent learning
Robyroc/Legio
Library to introduce fault-tolerance in MPI in the form of graceful degradation
cfl2005/ParaTra
LedgeDash/unum-paper