Pinned Repositories
a100_workshop
Abacus
Awesome-DL-Scheduling-Papers
booksim
casio
cocktail
Cocktail: A Multidimensional Optimization for Model Serving in Cloud (NSDI'22)
MArk-Project
Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving
ML-Accelerators
Topics in Machine Learning Accelerator Design
ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
synergy
wyhmhs's Repositories
wyhmhs/synergy
wyhmhs/booksim
wyhmhs/casio
wyhmhs/ML-Accelerators
Topics in Machine Learning Accelerator Design
wyhmhs/confidential-computing-zoo
Confidential Computing Zoo provides confidential computing solutions based on Intel SGX, TDX, HEXL, etc. technologies.
wyhmhs/EarlyRobust
wyhmhs/FlameGraph
Stack trace visualizer
wyhmhs/hack-SysML
The road to hack SysML and become an system expert
wyhmhs/igniter
iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud.
wyhmhs/llama.cpp
LLM inference in C/C++
wyhmhs/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
wyhmhs/LLMPerf-for-TiledArch
Analytical Performance Model for Tiled Accelerators/Dies in Spatial Architecture Running Large Language Models (LLMs)
wyhmhs/LoRA-ViT
Low rank adaptation for Vision Transformer
wyhmhs/megablocks
wyhmhs/mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
wyhmhs/Muri
Artifacts for our SIGCOMM'22 paper Muri
wyhmhs/NeuPIMs
NeuPIMs Simulator
wyhmhs/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
wyhmhs/PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
wyhmhs/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
wyhmhs/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
wyhmhs/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
wyhmhs/Proteus
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
wyhmhs/rafiki
Rafiki is a distributed system that supports training and deployment of machine learning models using AutoML, built with ease-of-use in mind.
wyhmhs/RobustSSL_Benchmark
Benchmark of robust self-supervised learning (RobustSSL) methods & Code for AutoLoRa (ICLR 2024).
wyhmhs/serve
Serve, optimize and scale PyTorch models in production
wyhmhs/TEESlice-artifact
wyhmhs/tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
wyhmhs/vmoe
wyhmhs/wyhmhs.github.io
Personal Website