/modelmesh-performance

ModelMesh Performance Scripts, Dashboard and Pipelines

Primary LanguageGoApache License 2.0Apache-2.0

ModelMesh Metrics

ModelMesh Performance Scripts, Dashboard and Pipelines. Follow the instruction to run gRPC inference requests benchmark on a ModelMesh Serving instance on a Kubernetes cluster:

  1. Setup ModelMesh Serving
  2. Create Example Models
  3. Run gRPC Inference Benchmark
  4. Monitoring
  5. Automation with KubeFlow Pipeline

Setup ModelMesh Serving

To quickly stand up a ModelMesh Serving instance on a Kubernetes Cluster, please see Quickstart for detail.

Model Deployment

Switch to the ModelMesh Serving namespace, assuming modelmesh-serving:

kubectl config set-context --current --namespace modelmesh-serving

We have included scripts to deploy models concurrently to a ModelMesh Serving instance. For example, the following deploys 10 simple-string tensorflow models at concurrency of 5:

cd multi_model_tester/
./deployNpredictors.sh 5 simple-string-tf 1 10 deploy_1simple_string_tf_predictor.sh

Verify that the 10 simple-string-tf-* models are Loaded:

kubectl get predictors | grep simple-string-tf

Run Inference Requests

To send requests locally, in a separate terminal run kubectl port-forward svc/modelmesh-serving 8033 and send inference requests to the 10 loaded simple-string-tf-* models using the multi_model_tester.

./multi_model_test -ma "SimpleStringTF" -npm 10 -qps 100 -dur 10

Running the driver from inside the Kubernetes cluster and namespace where the Modelmesh Serving instance is installed:

KUBECTL_COMMAND_HEADERS=false kubectl run -it --rm modelmesh-payload --restart=Never --image=aipipeline/modelmesh-payload:latest --command --namespace modelmesh-serving -- ./multi_model_test -ma 'SimpleStringTF' -u dns:///modelmesh-serving.modelmesh-serving:8033 -npm '10'  -qps '100' -dur '10'

Depending on where the driver is run, the output should be similar to:

QPS: 100
dur: 9.993041819
earliest: 76448
latest: 9990014958
end: 9992985483
reqs: 1000
success: 1.000000
p50: 3.58451ms
p95: 10.701875ms
p99: 29.714907ms
mean: 4.733809ms
max: 37.563672ms

Monitoring

Monitoring ModelMesh Serving metrics using Prometheus and Grafana Dashboard is highly recommended. See Monitoring documentation for details.

Test with KubeFlow Pipeline

Deploying models and sending inference requests can be automated using KubeFlow Pipeline. See Setup KubeFlow Tekton for details.