Pinned Repositories
lectures
Material for gpu-mode lectures
neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
awesome-profiling
Awesome utilities for performance profiling
C-compiler-optimizations
Description of commonly done compiler optimizations in C
ml-design-patterns
Software Architecture for ML engineers
multiple_dispatch
Why multiple dispatch lets you write composable code
ao
PyTorch native quantization and sparsity for training and inference
examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
serve
Serve, optimize and scale PyTorch models in production
msaroufim's Repositories
msaroufim/awesome-profiling
Awesome utilities for performance profiling
msaroufim/mynotes
msaroufim/mlsys-experiments
stuff
msaroufim/metal-tutorial
msaroufim/tinyoptimizer
msaroufim/Triton-Puzzles
Puzzles for learning Triton
msaroufim/cpuoffload
msaroufim/setup
msaroufim/cpu-offload
msaroufim/ao
PyTorch native quantization and sparsity for training and inference
msaroufim/axolotl
Go ahead and axolotl questions
msaroufim/factorio-rl
msaroufim/gradient-checkpointing
msaroufim/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).
msaroufim/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
msaroufim/keras-benchmarks-2
msaroufim/lecturex
msaroufim/Liger-Kernel
Efficient Triton Kernels for LLM Training
msaroufim/lit-llama
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
msaroufim/llm.c
LLM training in simple, raw C/CUDA
msaroufim/lm-evaluation-harness
A framework for few-shot evaluation of language models.
msaroufim/microbenchmarks
msaroufim/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
msaroufim/newblog
new blog, who dis?
msaroufim/nvcc4jupyter
A plugin for Jupyter Notebook to run CUDA C/C++ code
msaroufim/pyperformance
Python Performance Benchmark Suite
msaroufim/pytorch.github.io
The website for PyTorch
msaroufim/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
msaroufim/subclass_zoo
msaroufim/yolo-save
yolo push a commit to remote when you save any file