dingshaohua960303

dingshaohua960303's Stars

pytorch/torchtitan
A native PyTorch Library for large model training
Language:Python2.5k182
FMInference/DejaVu
Language:Python27633
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
Language:Python3.2k173
pytorch/captum
Model interpretability and understanding for PyTorch
Language:Python4.9k492
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python38.7k4.3k
rapidsai/rmm
RAPIDS Memory Manager
Language:C++485195
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
1.1k23
google/aqt
Language:Python25826
meta-llama/llama3
The official Meta Llama 3 GitHub site
Language:Python26.7k3k
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Language:Cuda1.3k141
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.8k1.3k
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python1.9k314
f-dangel/cockpit
Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
Language:Python47424
microsoft/torchscale
Foundation Architecture for (M)LLMs
Language:Python3k202
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
Language:Python5.3k503
mistralai/mistral-inference
Official inference library for Mistral models
Language:Jupyter Notebook9.6k850
lsds/KungFu
Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.
Language:Go29358
CerebrasResearch/nanoGNS
minimal nanoGPT SOGNS and PEPGNS with nanoGPT example
Language:Python3
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.6k509
Stonesjtu/pytorch_memlab
Profiling and inspecting memory in pytorch
Language:Python1k37
isocpp/CppCoreGuidelines
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
Language:CSS42.7k5.4k
IceClear/StableSR
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
Language:Python2.2k137
albanD/subclass_zoo
Language:Jupyter Notebook14424
MegEngine/mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
Language:C++16827
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language:C++58979
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++13.1k1.6k
bytedance/byteir
A model compilation solution for various hardware
Language:MLIR36941
MegEngine/InferLLM
a lightweight LLM model inference framework
Language:C++69286
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Language:C++45536
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
Language:Jupyter Notebook10k820