Pinned Repositories
calculon
cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
dace
DaCe - Data Centric Parallel Programming
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DHS-LLM-Workshop
DHS 2023 LLM Workshop by Sourab Mangrulkar
flash-attention
Fast and memory-efficient exact attention
json-tutorial
从零开始的 JSON 库教程
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
mlir-dace
Data-Centric MLIR dialect
dace
DaCe - Data Centric Parallel Programming
C-TC's Repositories
C-TC/calculon
C-TC/cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
C-TC/dace
DaCe - Data Centric Parallel Programming
C-TC/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
C-TC/DHS-LLM-Workshop
DHS 2023 LLM Workshop by Sourab Mangrulkar
C-TC/flash-attention
Fast and memory-efficient exact attention
C-TC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
C-TC/LPG2vec
C-TC/master_thesis
C-TC/mlir-dace
Data-Centric MLIR dialect
C-TC/Optimizations-of-ball-arithmetic
C-TC/PsPIN-benchmark-sparse-reduction
C-TC/Reliable-Transport-Project
C-TC/megablocks
C-TC/Megatron-LLM
distributed trainer for LLMs
C-TC/Megatron-LM
Ongoing research training transformer models at scale
C-TC/MS-AMP
Microsoft Automatic Mixed Precision Library
C-TC/nanotron
Minimalistic large language model 3D-parallelism training
C-TC/nccl
Optimized primitives for collective multi-GPU communication
C-TC/nccl-tests
NCCL Tests
C-TC/NeMo-Framework-Launcher
NeMo Megatron launcher and tools
C-TC/OLMo
Modeling, training, eval, and inference code for OLMo
C-TC/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
C-TC/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
C-TC/taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
C-TC/torchtitan
A native PyTorch Library for large model training
C-TC/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
C-TC/triton
Development repository for the Triton language and compiler
C-TC/veScale
A PyTorch Native LLM Training Framework
C-TC/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators