model | size | params | backend | threads | test | t/s |
---|---|---|---|---|---|---|
Meta llama 3 8B F16 | 14.96 GiB | 8.03 B | CPU | 4 | pp512 | 8.95 ± 0.99 |
Meta llama 3 8B F16 | 14.96 GiB | 8.03 B | CPU | 4 | tg128 | 2.15 ± 0.02 |
- We're using Georgi Gerganov's llama-bench
- llama-bench benchmarks only one engine (executor); it doesn't benchmark, for example, Triton
- Headings
- model: LLM Model, such as Meta's Meta-Llama-3-8B.
- size: (¿RAM footprint?), e.g. 14.96 GiB
- params: number of the model's parameters, e.g. "8.03 B" (8 billion)
- backend:
- CPU: you don't want this ;-)
- GPU: Graphic
- threads: (?number of independent threads?) e.g. "4"
- test: type of test performed:
- pp: "Prompt processing: processing a prompt in batches". Higher is better.
- tg: "Text generation: generating a sequence of tokens". Higher is better.
Typical invocation to run benchmark against a model:
./llama-bench -m models/ggml-model-f16.gguf
To convert a model from Hugging Face to llama.cpp's .gguf
format:
python convert_hf_to_gguf.py ~/workspace/Meta-Llama-3-8B/
Where ~/workspace/Meta-Llama-3-8B/
was downloaded/cloned from
https://huggingface.co/meta-llama/Meta-Llama-3-8B
Here's an example, run from within the llama.cpp repo:
. ~/workspace/benchmarks-ai/venv/bin/activate
pip install -U "huggingface_hub[cli]"
huggingface-cli login
huggingface-cli download deepset/roberta-base-squad2 --local-dir models/roberta-base-squad2
python convert_hf_to_gguf.py models/roberta-base-squad2
It appears that the conversion is only for LLaMA
models,
which torpedoes my hope of using llama-bench
as a golden standard.
Install on Linux (macOS isn't able to install the Python triton library):
git clone git@github.com:majestic-labs-AI/benchmarks-ai.git
cd benchmarks-ai
python -m venv venv
. venv/bin/activate
pip install matplotlib numpy pandas ffmpeg setuptools torch transformers triton
When seeing this error:
RuntimeError: Found no NVIDIA driver on your system.
sudo apt update && sudo apt upgrade
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
lspci | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
ubuntu-drivers devices
nvidia-driver-545
sudo ubuntu-drivers install
And when installing NVIDIA driver 545 fails with the following because you're on Ubuntu Noble Numbat 24.04 and the kernel is too new and has removed the variable that the NVIDIA driver expects:
dpkg: dependency problems prevent configuration of nvidia-driver-545:
nvidia-driver-545 depends on nvidia-dkms-545 (<= 545.29.06-1); however:
Package nvidia-dkms-545 is not configured yet.
nvidia-driver-545 depends on nvidia-dkms-545 (>= 545.29.06); however:
Package nvidia-dkms-545 is not configured yet.
sudo nvim /usr/src/linux-headers-6.8.0-36/include/drm/drm_ioctl.h
#define DRM_UNLOCKED 0
sudo shutdown -r now