This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference.
The Llama Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis.
You can install this repository as a package with pip install llama-toolchain
If you want to install from source:
mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-toolchain.git
conda create -n toolchain python=3.10
conda activate toolchain
cd llama-toolchain
pip install -e .
The llama
CLI makes it easy to configure and run the Llama toolchain. Read the CLI reference for details.
If you want to run FP8, you need the fbgemm-gpu
package which requires torch >= 2.4.0
(currently only in nightly, but releasing shortly...)
ENV=fp8_env
conda create -n $ENV python=3.10
conda activate $ENV
pip3 install -r fp8_requirements.txt