msaroufim
CUDA uninЇsțållåțîön fāīłüřęđ. Płēȃšę čøñțàçț șūppørt før åššīštåñćē
@PyTorch and @gpu-modeBay Area
Pinned Repositories
discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
lectures
Material for gpu-mode lectures
neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
awesome-profiling
Awesome utilities for performance profiling
C-compiler-optimizations
Description of commonly done compiler optimizations in C
ml-design-patterns
Software Architecture for ML engineers
multiple_dispatch
Why multiple dispatch lets you write composable code
ao
PyTorch native quantization and sparsity for training and inference
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
serve
Serve, optimize and scale PyTorch models in production
msaroufim's Repositories
msaroufim/awesome-profiling
Awesome utilities for performance profiling
msaroufim/mynotes
msaroufim/pytorch-load-inline-highlighter
VS Code extension for syntax highlighting C++/CUDA/HIP code in PyTorch load_inline() strings
msaroufim/tinyoptimizer
msaroufim/llm_coder
Help Claude know about your library by giving it the main APIs in a prompt and integrate it into VS Code
msaroufim/setup
msaroufim/Triton-Puzzles
Puzzles for learning Triton
msaroufim/gpumode-site
The world's best GPU community
msaroufim/ao
PyTorch native quantization and sparsity for training and inference
msaroufim/chess
msaroufim/cuda-python
CUDA Python: Performance meets Productivity
msaroufim/decent
msaroufim/factorio
msaroufim/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
msaroufim/Liger-Kernel
Efficient Triton Kernels for LLM Training
msaroufim/llm.c
LLM training in simple, raw C/CUDA
msaroufim/ruff
An extremely fast Python linter and code formatter, written in Rust.
msaroufim/ThunderKittens
Tile primitives for speedy kernels
msaroufim/timing
msaroufim/torchft
PyTorch per step fault tolerance (actively under development)
msaroufim/yolo-save
yolo push a commit to remote when you save any file
msaroufim/FACTO
Framework for Algorithmic Correctness Testing of Operators
msaroufim/flashinfer
FlashInfer: Kernel Library for LLM Serving
msaroufim/load_inline_slow
msaroufim/msaroufim
msaroufim/newblog
new blog, who dis?
msaroufim/quack
A Quirky Assortment of CuTe Kernels
msaroufim/sphinx-read-thedocs-test
msaroufim/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
msaroufim/vscodetemplate