pinxuezhao's Stars
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
unslothai/unsloth
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
mistralai/mistral-inference
Official inference library for Mistral models
THUDM/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
minitorch/minitorch
The full minitorch student suite.
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
NVIDIA/cuCollections
srush/annotated-mamba
Annotated version of the Mamba paper
harvardnlp/namedtensor
Named Tensor implementation for Torch
Dao-AILab/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
llm-random/llm-random
SamGinzburg/VectorVisor
VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly program in parallel using GPUs
netx-repo/PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
byungsoo-oh/ml-systems-papers
Curated collection of papers in machine learning systems
tommyip/mamba2-minimal
Minimal Mamba-2 implementation in PyTorch
parasailteam/coconet
kamalkraj/minGPT-TF
A minimal TF2 re-implementation of the OpenAI GPT training
gpgpu-sim/pytorch-gpgpu-sim
Modified version of PyTorch able to work with changes to GPGPU-Sim
SJTU-IPADS/ugache
MINI-PYTORCH/MINI-TORCH
Mini-pytorch implemented from scratch using Python