/transformers-bloom-inference

Fast Inference Solutions for BLOOM

Primary LanguagePythonApache License 2.0Apache-2.0

Fast Inference Solutions for BLOOM

This repo provides demos and packages to perform fast inference solutions for BLOOM. Some of the solutions have their own repos in which case a link to the corresponding repos is provided instead.

Some of the solutions provide both half-precision and int8-quantized solution.

Client-side solutions

Solutions developed to perform large batch inference locally:

Pytorch:

JAX:

Server solutions

Solutions developed to be used in a server mode (i.e. varied batch size, varied request rate):

Pytorch:

Rust: