Pinned Repositories
lectures
Material for cuda-mode lectures
neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
awesome-profiling
Awesome utilities for performance profiling
C-compiler-optimizations
Description of commonly done compiler optimizations in C
ml-design-patterns
Software Architecture for ML engineers
multiple_dispatch
Why multiple dispatch lets you write composable code
examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
serve
Serve, optimize and scale PyTorch models in production
msaroufim's Repositories
msaroufim/awesome-profiling
Awesome utilities for performance profiling
msaroufim/mynotes
msaroufim/metal-tutorial
msaroufim/mlsys-experiments
stuff
msaroufim/tinyoptimizer
msaroufim/cpuoffload
msaroufim/setup
msaroufim/cpu-offload
msaroufim/algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
msaroufim/axolotl
Go ahead and axolotl questions
msaroufim/ClassyVision
An end-to-end PyTorch framework for image and video classification
msaroufim/gradient-checkpointing
msaroufim/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).
msaroufim/ImageBind
ImageBind One Embedding Space to Bind Them All
msaroufim/keras-benchmarks-2
msaroufim/lecturex
msaroufim/lit-llama
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
msaroufim/llama-inference
experiments with inference on llama
msaroufim/llama2.c
Inference Llama 2 in one file of pure C
msaroufim/lm-evaluation-harness
A framework for few-shot evaluation of language models.
msaroufim/microbenchmarks
msaroufim/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
msaroufim/newblog
new blog, who dis?
msaroufim/nvcc4jupyter
A plugin for Jupyter Notebook to run CUDA C/C++ code
msaroufim/protoquant
Prototype routines for GPU quantization written using PyTorch.
msaroufim/pyperformance
Python Performance Benchmark Suite
msaroufim/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
msaroufim/pytorch.github.io
The website for PyTorch
msaroufim/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
msaroufim/serve
Serve, optimize and scale PyTorch models in production