Automated llm-d inference benchmarking on OpenShift with MLflow tracking and GitHub Actions integration, by using GuideLLM.
This might work with any other LLM endpoint but has only been tested with
llm-dendpoints.
This project uses the following:
Note
AWS IAM Policy is handled by the user, see mlflow/AWS_IAM_POLICY.md for more.
# Copy and configure environment
cp .env.example .env
# Edit .env with your credentials
# Deploy MLflow, PostgreSQL, and GitHub runners
./bootstrap.sh
# Dry run
./bootstrap.sh --dry-runThis deploys:
- MLflow - Experiment tracking with PostgreSQL backend and S3 storage
- Self-hosted GitHub runners - Run benchmarks via PR comments
- Custom benchmark image - Built and pushed to OpenShift registry
- All needed addons/operators (Kueue, Reflector).
Via GitHub Actions (recommended):
# Comment on any PR:
/benchmark qwen-0.6b-baseline
# With parameter overrides:
/benchmark qwen-0.6b-baseline
benchmark.maxSeconds=600
Warning
This repo does not handle llm-d deployment, so you need to make sure which model is running to make sure the benchmark succeeds.
Via Helm:
helm install <your_deployment_name> ./llm-d-bench \
-f llm-d-bench/experiments/qwen-0.6b-baseline.yaml \
-n <your_namespace>See llm-d-bench/ADDING_BENCHMARKS.md for adding new benchmark tools.
Quick summary:
- Add benchmark implementation to
llm-d-bench/templates/benchmarks/<tool-name>/ - Create experiment config in
llm-d-bench/experiments/ - Trigger via
/benchmark <experiment-name>in PR comments
For new experiments, add them in llm-d-bench/experiments.
Note
Experiment names cannot include . for security reasons.
The benchmark workflow (.github/workflows/benchmark.yaml) triggers on PR comments:
How it works:
- User comments
/benchmark <experiment>on a PR - Self-hosted runner picks up the job
- Checks out PR branch
- Runs Helm install with experiment config
- Waits for job completion (up to 12 hours)
- Reacts with 🚀 on success or 😕 on failure
Requirements:
- Self-hosted runner with label
openshift - GitHub environment named
benchmark - OpenShift secrets:
OPENSHIFT_SERVER_URL,OPENSHIFT_CA_CERT,OPENSHIFT_TOKEN - Only repository owner can trigger benchmarks
MLflow:
POSTGRES_PASSWORD=your-password
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
S3_BUCKET_NAME=your-bucket
AWS_REGION=us-east-1
MLFLOW_ADMIN_PASSWORD=your-passwordGitHub Runners:
GITHUB_TOKEN=ghp_your_token
GITHUB_OWNER=your-org-or-username
GITHUB_REPOSITORY= # Empty for org-wide runners
RUNNER_LABELS=openshift,self-hosted
RUNNER_REPLICAS=2Key parameters in llm-d-bench/values.yaml:
benchmark.target- Target inference endpointbenchmark.model- Model namebenchmark.rate- Concurrent request rates (e.g.,{1,50,100})benchmark.data- Number of requests or token specsbenchmark.maxSeconds- Max runtime (default: 600s)mlflow.enabled- Enable MLflow trackingkueue.enabled- Enable Kueue queues
- MLflow - Experiments tracked if
mlflow.enabled=True
Access MLflow UI:
oc get route mlflow -n mlflow -o jsonpath='{.spec.host}'
# Login with credentials from .env