pinxuezhao

pinxuezhao's Stars

karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python36.2k 367 3155.7k
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python26.8k 224 4.4k3.9k
unslothai/unsloth
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Language:Python15.6k 102 8021k
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python11.9k 101 511836
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python11.5k 201 2.2k2.4k
mistralai/mistral-inference
Official inference library for Mistral models
Language:Jupyter Notebook9.5k 122 136841
THUDM/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language:Python7.2k 115 219666
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
Language:Jupyter Notebook5.7k 29 30335
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
Language:Python4.4k 45 189436
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
Language:Python2.9k 35 71144
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Language:Python2.5k 24 27185
minitorch/minitorch
The full minitorch student suite.
Language:Python1.8k 18 5351
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python1.8k 41 295173
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
Language:Python478 4 6840
NVIDIA/cuCollections
Language:C++464 17 18284
srush/annotated-mamba
Annotated version of the Mamba paper
Language:Jupyter Notebook445 22 317
harvardnlp/namedtensor
Named Tensor implementation for Torch
Language:Jupyter Notebook441 22 3142
Dao-AILab/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
Language:Cuda282 4 1954
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Language:C++233 19 8630
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
Language:Python172 5 1725
llm-random/llm-random
Language:Python168 9 1212
SamGinzburg/VectorVisor
VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly program in parallel using GPUs
Language:WebAssembly142 5 53
netx-repo/PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
Language:Python124 5 633
byungsoo-oh/ml-systems-papers
Curated collection of papers in machine learning systems
123 5 07
tommyip/mamba2-minimal
Minimal Mamba-2 implementation in PyTorch
Language:Python88 2 19
parasailteam/coconet
Language:HTML72 4 911
kamalkraj/minGPT-TF
A minimal TF2 re-implementation of the OpenAI GPT training
Language:Jupyter Notebook55 3 218
gpgpu-sim/pytorch-gpgpu-sim
Modified version of PyTorch able to work with changes to GPGPU-Sim
Language:C++44 4 824
SJTU-IPADS/ugache
Language:C++20 9 24
MINI-PYTORCH/MINI-TORCH
Mini-pytorch implemented from scratch using Python
Language:Python90