cli99
@black-forest-labs. ex-Databricks Mosaic AI. ex-Microsoft DeepSpeed. UIUC PhD. I build efficient AI training and inference systems with GPUs.
Black Forest Labs
Pinned Repositories
academic-kickstart
Cheng's Website
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
acm-ccs
ACM Computing Classification System
bleve-search-react
cermine-docker
composer
Supercharge Your Model Training
flops-profiler
pytorch-profiler
hugo-cli99
llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
cli99's Repositories
cli99/flops-profiler
pytorch-profiler
cli99/accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
cli99/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
cli99/cli99.github.io
cli99/CMU-CS11-711
Solutions of the CMU Advanced Natural Language Processing Course
cli99/hypermodern-python-cli99
cli99/CogView2
official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"
cli99/dalle-2-preview
cli99/dalle-mini
DALL·E Mini - Generate images from a text prompt
cli99/DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
cli99/dino
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
cli99/effective_transformer
Running BERT without Padding
cli99/EnergonAI
Large-scale model inference.
cli99/FasterTransformer
Transformer related optimization, including BERT, GPT
cli99/flash-attention
Fast and memory-efficient exact attention
cli99/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation.
cli99/imaginaire
NVIDIA PyTorch GAN library with distributed and mixed precision support
cli99/kernl
Kernl lets you run Pytorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
cli99/minbert-assignment
Minimalist BERT implementation assignment for CS11-711
cli99/minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
cli99/mpipe
Python API for writing multiprocessing pipelines
cli99/project_openai_codex
cli99/python-mastery
Advanced Python Mastery (course by @dabeaz)
cli99/smdebug_examples
cli99/smoothquant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
cli99/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
cli99/TLCBench
Benchmark scripts for TVM
cli99/torchview
torchview: visualize pytorch models
cli99/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
cli99/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators