Benchmarking Large Language Model (LLM) Artificial Intelligence (AI) Performance

model	size	params	backend	threads	test	t/s
Meta llama 3 8B F16	14.96 GiB	8.03 B	CPU	4	pp512	8.95 ± 0.99
Meta llama 3 8B F16	14.96 GiB	8.03 B	CPU	4	tg128	2.15 ± 0.02

We're using Georgi Gerganov's llama-bench
llama-bench benchmarks only one engine (executor); it doesn't benchmark, for example, Triton
Headings
- model: LLM Model, such as Meta's Meta-Llama-3-8B.
- size: (¿RAM footprint?), e.g. 14.96 GiB
- params: number of the model's parameters, e.g. "8.03 B" (8 billion)
- backend:
  - CPU: you don't want this ;-)
  - GPU: Graphic
- threads: (?number of independent threads?) e.g. "4"
- test: type of test performed:
  - pp: "Prompt processing: processing a prompt in batches". Higher is better.
  - tg: "Text generation: generating a sequence of tokens". Higher is better.

Typical invocation to run benchmark against a model:

./llama-bench -m models/ggml-model-f16.gguf

To convert a model from Hugging Face to llama.cpp's .gguf format:

python convert_hf_to_gguf.py ~/workspace/Meta-Llama-3-8B/

Where ~/workspace/Meta-Llama-3-8B/ was downloaded/cloned from https://huggingface.co/meta-llama/Meta-Llama-3-8B

Here's an example, run from within the llama.cpp repo:

. ~/workspace/benchmarks-ai/venv/bin/activate
pip install -U "huggingface_hub[cli]"
huggingface-cli login
huggingface-cli download deepset/roberta-base-squad2 --local-dir models/roberta-base-squad2
python convert_hf_to_gguf.py models/roberta-base-squad2

It appears that the conversion is only for LLaMA models, which torpedoes my hope of using llama-bench as a golden standard.

Setting up a Google L4

Setting up a Noble Numbat AI workstation

Install on Linux (macOS isn't able to install the Python triton library):

git clone git@github.com:majestic-labs-AI/benchmarks-ai.git
cd benchmarks-ai
python -m venv venv
. venv/bin/activate
pip install matplotlib numpy pandas ffmpeg setuptools torch transformers triton

Troubleshooting

When seeing this error:

RuntimeError: Found no NVIDIA driver on your system.

sudo apt update && sudo apt upgrade
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
lspci | grep -i nvidia
  03:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
ubuntu-drivers devices
  nvidia-driver-545
sudo ubuntu-drivers install

And when installing NVIDIA driver 545 fails with the following because you're on Ubuntu Noble Numbat 24.04 and the kernel is too new and has removed the variable that the NVIDIA driver expects:

dpkg: dependency problems prevent configuration of nvidia-driver-545:
 nvidia-driver-545 depends on nvidia-dkms-545 (<= 545.29.06-1); however:
  Package nvidia-dkms-545 is not configured yet.
 nvidia-driver-545 depends on nvidia-dkms-545 (>= 545.29.06); however:
  Package nvidia-dkms-545 is not configured yet.

sudo nvim /usr/src/linux-headers-6.8.0-36/include/drm/drm_ioctl.h

#define DRM_UNLOCKED 0

sudo shutdown -r now

cunnie/benchmarks-ai

Benchmarking Large Language Model (LLM) Artificial Intelligence (AI) Performance

Setting up a Google L4

Setting up a Noble Numbat AI workstation

Troubleshooting