ColossalAI Implementation for BLOOM Inference

Under development. This repo is going to support BLOOM Inference with optimizations, such as Tensor Parallelism, Int8 quantization, with the help of ColossalAI.

Fast Inference Solutions for BLOOM

This repo provides demos and packages to perform fast inference solutions for BLOOM. Some of the solutions have their own repos in which case a link to the corresponding repos is provided instead.

Some of the solutions provide both half-precision and int8-quantized solution.

Client-side solutions

Solutions developed to perform large batch inference locally:

Pytorch:

Accelerate, DeepSpeed-Inference and DeepSpeed-ZeRO
Custom HF Code.

JAX:

BLOOM Inference in JAX

Server solutions

Solutions developed to be used in a server mode (i.e. varied batch size, varied request rate):

Pytorch:

Accelerate and DeepSpeed-Inference based solutions

Rust:

Bloom-server

feifeibear/ColoBloom

ColossalAI Implementation for BLOOM Inference

Fast Inference Solutions for BLOOM

Client-side solutions

Server solutions