OrenLeung
@teslamotors @uWaterloo | Former @nvidia @Cohere-AI @uptake @hackclub @Voic.AI
@uWaterlooToronto, Canada
Pinned Repositories
ao
PyTorch native quantization and sparsity for training and inference
benchmark_gemm
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
OrenLeung's Repositories
OrenLeung/benchmark_gemm
OrenLeung/ao
PyTorch native quantization and sparsity for training and inference
OrenLeung/CUDALibrarySamples
CUDA Library Samples
OrenLeung/cutlass
CUDA Templates for Linear Algebra Subroutines
OrenLeung/ml-engineering
Machine Learning Engineering Open Book
OrenLeung/nanoGPT-amd
The simplest, fastest repository for training/finetuning medium-sized GPTs.
OrenLeung/nccl-tests
NCCL Tests
OrenLeung/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
OrenLeung/foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
OrenLeung/fsdp
OrenLeung/nano-llama31
nanoGPT style version of Llama 3.1
OrenLeung/nccl
Optimized primitives for collective multi-GPU communication
OrenLeung/training_results_v4.0
This repository contains the results and code for the MLPerf™ Training v4.0 benchmark.
OrenLeung/TransformerEngine
OrenLeung/triton
Development repository for the Triton language and compiler