wzhao18's Stars
SimplifyJobs/New-Grad-Positions
A collection of full time roles in SWE, Quant, and PM for new grads.
HazyResearch/aisys-building-blocks
Building blocks for foundation models.
CentML/flexible-inference-bench
A modular, extensible LLM inference benchmarking framework that supports multiple benchmarking frameworks and paradigms.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
dottxt-ai/outlines
Structured Text Generation
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
tensor-compiler/taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
SimplifyJobs/Summer2025-Internships
Collection of Summer 2025 tech internships!
triton-lang/triton
Development repository for the Triton language and compiler
nchong/cudahook
Intercepting CUDA runtime calls with LD_PRELOAD
ampersand-projects/tilt
meta-llama/llama
Inference code for Llama models
yalue/cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
UofT-EcoSystem/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.
UofT-EcoSystem/GPU-Virtualization-Benchmarks
S-Lab-System-Group/HeliosData
Helios Traces from SenseTime
CompVis/stable-diffusion
A latent text-to-image diffusion model
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
apple/ml-cvnets
CVNets: A library for training computer vision networks
hidet-org/hidet
An open-source efficient deep learning framework/compiler, written in python.
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
wangz585/whoami
facebook/rocksdb
A library that provides an embeddable, persistent key-value store for fast storage.
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
NVIDIA/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
SymbioticLab/Salus
Fine-grained GPU sharing primitives
tensorflow/tensorboard
TensorFlow's Visualization Toolkit
ultralytics/yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
alibaba/clusterdata
cluster data collected from production clusters in Alibaba for cluster management research
mlcommons/training
Reference implementations of MLPerf™ training benchmarks