dingshaohua960303's Stars
pytorch/torchtitan
A native PyTorch Library for large model training
FMInference/DejaVu
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
pytorch/captum
Model interpretability and understanding for PyTorch
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
rapidsai/rmm
RAPIDS Memory Manager
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
google/aqt
meta-llama/llama3
The official Meta Llama 3 GitHub site
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
f-dangel/cockpit
Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
microsoft/torchscale
Foundation Architecture for (M)LLMs
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
mistralai/mistral-inference
Official inference library for Mistral models
lsds/KungFu
Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.
CerebrasResearch/nanoGNS
minimal nanoGPT SOGNS and PEPGNS with nanoGPT example
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Stonesjtu/pytorch_memlab
Profiling and inspecting memory in pytorch
isocpp/CppCoreGuidelines
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
IceClear/StableSR
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
albanD/subclass_zoo
MegEngine/mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
triton-lang/triton
Development repository for the Triton language and compiler
bytedance/byteir
A model compilation solution for various hardware
MegEngine/InferLLM
a lightweight LLM model inference framework
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs