Finalise work done for Seldon Core Operator during Obeservability Workshop

Question

Finalise work done for Seldon Core Operator during Obeservability Workshop

Closed this issue 2 years ago · 1 comments

Finalise work done for Seldon Core Operator during Obeservability Workshop

Work items are tracked in https://warthogs.atlassian.net/browse/KF-775
Branch: https://github.com/canonical/seldon-core-operator/tree/kf-775-gh52-feat-alert-rules
Prometheus deployment https://github.com/canonical/prometheus-k8s-operator

Design

Failure alerts are implemented through integration with Prometheus Charm from Canonical Observability Stack. Prometheus creates scrape jobs based on configured alert rules defined by Seldon Core Operator Charm. Then it scrapes targets, retrieves defined metrics, and performs required calculations.

Testing

Setup MicroK8S cluster and Juju controller:

microk8s enable dns storage metallb:"10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111"
juju bootstrap microk8s uk8s
juju add-model test

Deploy Prometheus and Seldon Core Operator and relate them.

juju deploy prometheus-k8s --trust
juju deploy ./seldon-core_ubuntu-20.04-amd64.charm seldon-controller-manager --trust --resource oci-image="docker.io/seldonio/seldon-core-operator:1.14.0"
juju relate prometheus-k8s seldon-controller-manager

Navigate to Prometheus dashboard https://<Prometheus-unit-IP>:9090, select Status->Targets
There should be Promethus scrape job that targets Seldon metrics endpoint (http://<Seldon-Controller-Manager-IP>:8080/metrics) entry with no errors:
Deploy sample Seldon deployment in the same model to observe if any failure alert is reported by navigating to Alerts

microk8s.kubectl -n test apply -f examples/serve-simple-v1.yaml

To simulate failure delete deployment that was created by Seldon and observe alerts:

microk8s.kubectl -n test delete deploy/seldon-model-example-0-classifier

NOTE: That alerts window is 10 minutes. Scraping is done once per minute. Make sure at lease 2 minutes have passed for proper rate calculation.

Answer 1 · 2022-12-06T23:36:16.000Z

Closing this issue.