MachineLearningSystem

Pinned Repositories

.github
2 0 00
23arxiv-sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Language:Python2 0 00
24MLSYS-prompt-cache
Modular and structured prompt caching for low-latency LLM inference
Language:Python6 0 00
24PPOPP-Liger
Language:C++5 0 00
25ASPLOS-Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
Language:HTML7 0 00
ATC23-Legion
RC4ML GNN System Projects
Language:C++3 0 00
Awesome-Distributed-Deep-Learning
A curated list of awesome Distributed Deep Learning resources.
1 0 00
Awesome-DL-Scheduling-Papers
2 0 00
Optimus-CC
[ASPLOS'23] Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Language:Python3 0 04
SparDL
Language:Python2 0 01

MachineLearningSystem's Repositories

MachineLearningSystem/24ECCV-ElasticCache
[ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Language:Python0 0
MachineLearningSystem/24ECCV-FastV
[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Language:Python0 0
MachineLearningSystem/24ECCV-vfusion3d
[ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Language:Python0 0
MachineLearningSystem/24ICML-dejavu
Language:C++0 0
MachineLearningSystem/24SIGCOMM-stellatrain
Official Github repository for the SIGCOMM '24 paper "Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs"
Language:C++0 0
MachineLearningSystem/24SOSP-LoongServe
Language:Jupyter Notebook0 0
MachineLearningSystem/alfworld
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Language:Python0 0
MachineLearningSystem/cake
Distributed LLM inference for mobile, desktop and server.
Language:Rust0 0
MachineLearningSystem/ChatDev
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
Language:Shell0 0
MachineLearningSystem/core_scheduler
CoreScheduler: A High-Performance Scheduler for Large Model Training
Language:C++0 0
MachineLearningSystem/DiT-MoE
Scaling Diffusion Transformers with Mixture of Experts
Language:Python0 0
MachineLearningSystem/DoubleSparse
16-fold memory access reduction with nearly no loss
Language:Python0 0
MachineLearningSystem/GPTSwarm
🐝 GPTSwarm: LLM agents as (Optimizable) Graphs
Language:Python0 0
MachineLearningSystem/Inf-DiT
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Language:Python0 0
MachineLearningSystem/Kolors
Kolors Team
Language:Python0 0
MachineLearningSystem/learning-to-cache
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Language:Python0 0
MachineLearningSystem/LLMGA
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024
Language:Python0 0
MachineLearningSystem/lotus
Language:Python0 0
MachineLearningSystem/mem0
The memory layer for Personalized AI
Language:Python0 0
MachineLearningSystem/metron
LLM Serving Performance Evaluation Harness
Language:Python0 0
MachineLearningSystem/MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Language:Python0 0
MachineLearningSystem/MoA
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
Language:Python0 0
MachineLearningSystem/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Language:Cuda0 0
MachineLearningSystem/new-flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Language:C++0 0
MachineLearningSystem/OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Language:Python0 0
MachineLearningSystem/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Language:Python0 0
MachineLearningSystem/OSDI24-ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Language:Python0 0
MachineLearningSystem/quokka
Making data lake work for time series
Language:Python0 0
MachineLearningSystem/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
Language:Python0 0
MachineLearningSystem/vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
Language:C0 0