msaroufim

CUDA uninЇsțållåțîön fāīłüřęđ. Płēȃšę čøñțàçț șūppørt før åššīštåñćē

@PyTorchThe Matrix

Pinned Repositories

lectures
Material for gpu-mode lectures
Language:Jupyter Notebook3k 42 8304
neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
Language:Python252 16 1656
awesome-profiling
Awesome utilities for performance profiling
144 7 06
C-compiler-optimizations
Description of commonly done compiler optimizations in C
43 5 09
ml-design-patterns
Software Architecture for ML engineers
383 11 131
multiple_dispatch
Why multiple dispatch lets you write composable code
Language:Julia40 4 12
ao
PyTorch native quantization and sparsity for training and inference
Language:Python1.6k 41 299177
examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Language:Python22.4k 398 6409.5k
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python84.3k 1.7k 46.8k22.7k
serve
Serve, optimize and scale PyTorch models in production
Language:Java4.2k 57 1.6k863

msaroufim's Repositories

msaroufim/awesome-profiling
Awesome utilities for performance profiling
144 7 06
msaroufim/mynotes
Language:Python17 5 05
msaroufim/mlsys-experiments
stuff
Language:Jupyter Notebook5 4 0
msaroufim/metal-tutorial
Language:Swift4 2 0
msaroufim/tinyoptimizer
Language:Python3 3 0
msaroufim/Triton-Puzzles
Puzzles for learning Triton
3
msaroufim/cpuoffload
Language:Python2 2 0
msaroufim/setup
Language:Shell2 2 0
msaroufim/cpu-offload
Language:Python1 2 0
msaroufim/ao
PyTorch native quantization and sparsity for training and inference
Language:Python
msaroufim/axolotl
Go ahead and axolotl questions
Language:Python1 0
msaroufim/factorio-rl
msaroufim/gradient-checkpointing
2 0
msaroufim/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).
Language:Python1 0
msaroufim/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
msaroufim/keras-benchmarks-2
Language:Python1 0
msaroufim/lecturex
Language:Cuda2 0
msaroufim/Liger-Kernel
Efficient Triton Kernels for LLM Training
msaroufim/lit-llama
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
Language:Python1 0
msaroufim/llm.c
LLM training in simple, raw C/CUDA
msaroufim/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python1 0
msaroufim/microbenchmarks
Language:Python2 0
msaroufim/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Language:Python1 0
msaroufim/newblog
new blog, who dis?
Language:CSS3 0
msaroufim/nvcc4jupyter
A plugin for Jupyter Notebook to run CUDA C/C++ code
Language:Python1 0
msaroufim/pyperformance
Python Performance Benchmark Suite
Language:Python1 0
msaroufim/pytorch.github.io
The website for PyTorch
Language:HTML1 0
msaroufim/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook1 0
msaroufim/subclass_zoo
Language:Python2 0
msaroufim/yolo-save
yolo push a commit to remote when you save any file
Language:Shell