Pinned Repositories
.github
23arxiv-sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
24MLSYS-prompt-cache
Modular and structured prompt caching for low-latency LLM inference
24PPOPP-Liger
25ASPLOS-Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
ATC23-Legion
RC4ML GNN System Projects
Awesome-Distributed-Deep-Learning
A curated list of awesome Distributed Deep Learning resources.
Awesome-DL-Scheduling-Papers
Optimus-CC
[ASPLOS'23] Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
SparDL
MachineLearningSystem's Repositories
MachineLearningSystem/24ECCV-ElasticCache
[ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache
MachineLearningSystem/24ECCV-FastV
[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
MachineLearningSystem/24ECCV-vfusion3d
[ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
MachineLearningSystem/24ICML-dejavu
MachineLearningSystem/24SIGCOMM-stellatrain
Official Github repository for the SIGCOMM '24 paper "Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs"
MachineLearningSystem/24SOSP-LoongServe
MachineLearningSystem/alfworld
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
MachineLearningSystem/cake
Distributed LLM inference for mobile, desktop and server.
MachineLearningSystem/ChatDev
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
MachineLearningSystem/core_scheduler
CoreScheduler: A High-Performance Scheduler for Large Model Training
MachineLearningSystem/DiT-MoE
Scaling Diffusion Transformers with Mixture of Experts
MachineLearningSystem/DoubleSparse
16-fold memory access reduction with nearly no loss
MachineLearningSystem/GPTSwarm
🐝 GPTSwarm: LLM agents as (Optimizable) Graphs
MachineLearningSystem/Inf-DiT
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
MachineLearningSystem/Kolors
Kolors Team
MachineLearningSystem/learning-to-cache
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
MachineLearningSystem/LLMGA
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024
MachineLearningSystem/lotus
MachineLearningSystem/mem0
The memory layer for Personalized AI
MachineLearningSystem/metron
LLM Serving Performance Evaluation Harness
MachineLearningSystem/MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
MachineLearningSystem/MoA
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
MachineLearningSystem/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
MachineLearningSystem/new-flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
MachineLearningSystem/OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
MachineLearningSystem/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
MachineLearningSystem/OSDI24-ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
MachineLearningSystem/quokka
Making data lake work for time series
MachineLearningSystem/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
MachineLearningSystem/vattention
Dynamic Memory Management for Serving LLMs without PagedAttention