cli99

@black-forest-labs. ex-Databricks Mosaic AI. ex-Microsoft DeepSpeed. UIUC PhD. I build efficient AI training and inference systems with GPUs.

Black Forest Labs

Pinned Repositories

academic-kickstart
Cheng's Website
Language:Shell0 0 00
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
Language:Python0 0 00
acm-ccs
ACM Computing Classification System
Language:Python1 2 00
bleve-search-react
Language:JavaScript1 1 00
cermine-docker
Language:Dockerfile1 1 01
composer
Supercharge Your Model Training
Language:Python1 0 00
flops-profiler
pytorch-profiler
Language:Python51 1 88
hugo-cli99
Language:TeX1 1 00
llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Language:Python454 8 1453

cli99's Repositories

cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Language:Python454 8 1453
cli99/composer
Supercharge Your Model Training
Language:Python1 0 00
cli99/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
cli99/AutoFP8
Language:Python0 0
cli99/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python0 0
cli99/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++
cli99/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Language:Python0 0
cli99/Diffusion-Models-pytorch
Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)
Language:Jupyter Notebook
cli99/KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language:Python0 0
cli99/llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python0 0
cli99/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
cli99/llm-foundry
LLM training code for MosaicML foundation models
Language:Python0 0
cli99/llm-scripts
Language:Python2 01
cli99/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python0 0
cli99/megablocks
Language:Python0 0
cli99/minRF
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
cli99/nccl-tests
NCCL Tests
Language:Cuda0 0
cli99/quant-matmul
cli99/sampleproject
A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"
cli99/starter-academic
Language:Jupyter Notebook1 0
cli99/stk
Language:Python0 0
cli99/superbenchmark
A validation and profiling tool for AI infrastructure
Language:Python0 0
cli99/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++0 0
cli99/theme-academic-cv
🎓 无需编写任何代码即可轻松创建漂亮的学术网站 Easily create a beautiful academic résumé or educational website using Hugo and GitHub. No code.
Language:TeX0 0
cli99/torchtitan
A native PyTorch Library for large model training
Language:Python
cli99/transformer_framework
framework for plug and play of various transformers (vision and nlp) with FSDP
Language:Python0 0
cli99/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python0 0
cli99/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Language:Python0 0
cli99/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
cli99/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0