cli99

@black-forest-labs. ex-Databricks Mosaic AI. ex-Microsoft DeepSpeed. UIUC PhD. I build efficient AI training and inference systems with GPUs.

Black Forest Labs

Pinned Repositories

academic-kickstart
Cheng's Website
Language:Shell0 0 00
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
Language:Python0 0 00
acm-ccs
ACM Computing Classification System
Language:Python2 2 00
bleve-search-react
Language:JavaScript1 1 00
cermine-docker
Language:Dockerfile1 1 01
composer
Supercharge Your Model Training
Language:Python1 0 00
flops-profiler
pytorch-profiler
Language:Python51 1 88
hugo-cli99
Language:TeX1 1 00
llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Language:Python454 8 1453

cli99's Repositories

cli99/flops-profiler
pytorch-profiler
Language:Python51 1 88
cli99/accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
Language:Python0 0 00
cli99/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Language:Python0 0 00
cli99/cli99.github.io
Language:HTML0 1 00
cli99/CMU-CS11-711
Solutions of the CMU Advanced Natural Language Processing Course
Language:Jupyter Notebook0 0 00
cli99/hypermodern-python-cli99
Language:Python0 1 00
cli99/CogView2
official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"
Language:Python0 0
cli99/dalle-2-preview
0 0
cli99/dalle-mini
DALL·E Mini - Generate images from a text prompt
cli99/DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Language:Python0 0
cli99/dino
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Language:Python0 0
cli99/effective_transformer
Running BERT without Padding
Language:C++0 0
cli99/EnergonAI
Large-scale model inference.
Language:Python0 0
cli99/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++0 0
cli99/flash-attention
Fast and memory-efficient exact attention
Language:Python0 0
cli99/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation.
Language:Python0 0
cli99/imaginaire
NVIDIA PyTorch GAN library with distributed and mixed precision support
Language:Python
cli99/kernl
Kernl lets you run Pytorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Language:Jupyter Notebook0 0
cli99/minbert-assignment
Minimalist BERT implementation assignment for CS11-711
Language:Python0 0
cli99/minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
cli99/mpipe
Python API for writing multiprocessing pipelines
Language:Python
cli99/project_openai_codex
Language:JavaScript0 0
cli99/python-mastery
Advanced Python Mastery (course by @dabeaz)
Language:Python0 0
cli99/smdebug_examples
Language:Python
cli99/smoothquant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python0 0
cli99/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
Language:C++0 0
cli99/TLCBench
Benchmark scripts for TVM
cli99/torchview
torchview: visualize pytorch models
Language:Python0 0
cli99/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
Language:Python0 0
cli99/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python