C-TC's Stars
stas00/ml-engineering
Machine Learning Engineering Open Book
plasma-umass/scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
OptimalScale/LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
pytorch/torchtitan
A PyTorch native library for large model training
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
basicmi/AI-Chip
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
ggchivalrous/yiyin
一款照片水印添加工具
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
volcengine/veScale
A PyTorch Native LLM Training Framework
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
AmadeusChan/Awesome-LLM-System-Papers
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
microsoft/msccl
Microsoft Collective Communication Library
LLaMafia/llamafia.github
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
microsoft/superbenchmark
A validation and profiling tool for AI infrastructure
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
pytorch-labs/float8_experimental
This repository contains the experimental PyTorch native float8 training UX
microsoft/microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats