rdspring1
A PhD graduate researching Machine Learning, Locality-Sensitive Hashing, and Deep Learning Compilers.
Rice University; @RUSH-LAB ; @NvidiaSanta Clara
rdspring1's Stars
jwasham/coding-interview-university
A complete computer science study plan to become a software engineer.
microsoft/MS-DOS
The original sources of MS-DOS 1.25, 2.0, and 4.0 for reference purposes
karpathy/LLM101n
LLM101n: Let's build a Storyteller
meta-llama/llama3
The official Meta Llama 3 GitHub site
HigherOrderCO/Bend
A massively parallel, high-level programming language
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
triton-lang/triton
Development repository for the Triton language and compiler
khangich/machine-learning-interview
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
mwouts/jupytext
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
smogon/pokemon-showdown
Pokémon battle simulator.
PabloMK7/citra
A Nintendo 3DS Emulator
wzchen/probability_cheatsheet
A comprehensive 10-page probability cheatsheet that covers a semester's worth of introduction to probability.
ridgerchu/matmulfreellm
Implementation for MatMul-free LM.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
NVIDIA/cccl
CUDA Core Compute Libraries
Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
srush/Triton-Puzzles
Puzzles for learning Triton
Ligo-Biosciences/AlphaFold3
Open source implementation of AlphaFold3
waymo-research/waymax
A JAX-based simulator for autonomous driving research.
volcengine/veScale
A PyTorch Native LLM Training Framework
open-mpi/hwloc
Hardware locality (hwloc)
te42kyfo/gpu-benches
collection of benchmarks to measure basic GPU capabilities
endia-org/Endia
Arrays, Tensors and dynamic Neural Networks in Mojo 🔥
eth-cscs/COSMA
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
srush/prof8
Experimental paper writing linter.
GT-TDAlab/dagP
Multilevel Directed Acyclic Graph Partitioner
JacksonAllan/c_cpp_hash_tables_benchmark
A comparative, extendable benchmarking suite for C and C++ hash-table libraries.
eth-cscs/Tiled-MM
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
tonyzhang617/nomad-dist