C-TC

PhD Student @ Scalable Parallel Computing Lab, ETH Zürich

Zurich, Switzerland

C-TC's Stars

stas00/ml-engineering
Machine Learning Engineering Open Book
Language:Python13k 121 32794
plasma-umass/scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Language:Python12.5k 90 481405
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python11.6k 169 8752.6k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++9.6k 102 2.3k1.1k
OptimalScale/LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
Language:Python8.4k 70 422836
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
3.8k 61 4232
pytorch/torchtitan
A PyTorch native library for large model training
Language:Python3.4k 53 258295
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
Language:C++3k 44 413511
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python3k 24 195231
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda2.1k 37 41119
basicmi/AI-Chip
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
Language:PHP1.7k 267 25275
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
Language:Python1.6k 46 101162
ggchivalrous/yiyin
一款照片水印添加工具
Language:TypeScript1.2k 5 4776
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Language:Python897 15 214131
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Language:HTML773 25 227177
volcengine/veScale
A PyTorch Native LLM Training Framework
Language:Python739 32 2041
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
Language:Python692 12 3960
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Language:Cuda625 31 14119
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
Language:Python575 11 6748
AmadeusChan/Awesome-LLM-System-Papers
542 17 123
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Language:Python515 9 926
feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Language:Python441 5 2232
Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Language:Python398 41 7956
microsoft/msccl
Microsoft Collective Communication Library
Language:C++339 12 2831
LLaMafia/llamafia.github
Language:Python318 21 216
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Language:C++304 20 10745
microsoft/superbenchmark
A validation and profiling tool for AI infrastructure
Language:Python297 16 7362
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
222 4 08
pytorch-labs/float8_experimental
This repository contains the experimental PyTorch native float8 training UX
Language:Python221 25 4719
microsoft/microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
Language:Python204 8 2530

C-TC

C-TC's Stars

stas00/ml-engineering

plasma-umass/scalene

NVIDIA/Megatron-LM

NVIDIA/TensorRT-LLM

OptimalScale/LMFlow

opendilab/awesome-RLHF

pytorch/torchtitan

openxla/xla

ModelTC/lightllm

HazyResearch/ThunderKittens

basicmi/AI-Chip

huggingface/nanotron

ggchivalrous/yiyin

alibaba/Pai-Megatron-Patch

pytorch/kineto

volcengine/veScale

zhuzilin/ring-flash-attention

NVIDIA/multi-gpu-programming-models

Azure/MS-AMP

AmadeusChan/Awesome-LLM-System-Papers

BobMcDear/attorch

feifeibear/long-context-attention

Oneflow-Inc/libai

microsoft/msccl

LLaMafia/llamafia.github

microsoft/mscclpp

microsoft/superbenchmark

galeselee/Awesome_LLM_System-PaperList

pytorch-labs/float8_experimental

microsoft/microxcaling