/PUMA

Aim to be a lightweight, high-performance inference engine for heterogeneous devices. WIP.

Primary LanguageRustApache License 2.0Apache-2.0

PUMA

Puma aims to be a lightweight, high-performance inference engine for heterogeneous devices. Currently under active development.

How to Run

Build

Run make build to build the puma binary.

Run

Run ./puma help to see all available commands.

For example, you can run ./puma version to see the binary version.

Supported Backends

Use llama.cpp as the default backend for quick prototyping, will implement our own backend in the future.