C-TC

PhD Student @ Scalable Parallel Computing Lab, ETH Zürich

Zurich, Switzerland

Pinned Repositories

calculon
Language:Python00
cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Language:Python00
dace
DaCe - Data Centric Parallel Programming
Language:Python00
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python00
DHS-LLM-Workshop
DHS 2023 LLM Workshop by Sourab Mangrulkar
Language:Jupyter Notebook00
flash-attention
Fast and memory-efficient exact attention
Language:Python00
json-tutorial
从零开始的 JSON 库教程
Language:C00
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python00
mlir-dace
Data-Centric MLIR dialect
Language:C++00
dace
DaCe - Data Centric Parallel Programming
Language:Python470 17 330116

C-TC's Repositories

C-TC/calculon
Language:Python00
C-TC/cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Language:Python00
C-TC/dace
DaCe - Data Centric Parallel Programming
Language:Python00
C-TC/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python00
C-TC/DHS-LLM-Workshop
DHS 2023 LLM Workshop by Sourab Mangrulkar
Language:Jupyter Notebook00
C-TC/flash-attention
Fast and memory-efficient exact attention
Language:Python00
C-TC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python00
C-TC/LPG2vec
Language:Python01
C-TC/master_thesis
Language:C00
C-TC/mlir-dace
Data-Centric MLIR dialect
Language:C++00
C-TC/Optimizations-of-ball-arithmetic
Language:C0 1 00
C-TC/PsPIN-benchmark-sparse-reduction
Language:C0 1 00
C-TC/Reliable-Transport-Project
Language:Python0 1 00
C-TC/megablocks
C-TC/Megatron-LLM
distributed trainer for LLMs
C-TC/Megatron-LM
Ongoing research training transformer models at scale
Language:Python
C-TC/MS-AMP
Microsoft Automatic Mixed Precision Library
C-TC/nanotron
Minimalistic large language model 3D-parallelism training
Language:Python
C-TC/nccl
Optimized primitives for collective multi-GPU communication
C-TC/nccl-tests
NCCL Tests
C-TC/OLMo
Modeling, training, eval, and inference code for OLMo
Language:Python
C-TC/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
C-TC/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
C-TC/taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
C-TC/torchtitan
A native PyTorch Library for large model training
Language:Python
C-TC/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python
C-TC/triton
Development repository for the Triton language and compiler
C-TC/veScale
A PyTorch Native LLM Training Framework
C-TC/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
C-TC/xv6-riscv
Xv6 for RISC-V