Finalise Notebooks alert rules work done during Obeservability Workshop
Closed this issue · 1 comments
i-chvets commented
Finalise work done for Jupyter Controller during Obeservability Workshop
Work items are tracked in https://warthogs.atlassian.net/browse/KF-827
Branch: https://github.com/canonical/notebook-operators/tree/kf-827-gh81-feat-alert-rules
Prometheus deployment https://github.com/canonical/prometheus-k8s-operator
Design
Failure alerts are implemented through integration with Prometheus Charm from Canonical Observability Stack. Prometheus creates scrape jobs based on configured alert rules defined by Jupyter Controller Charm. Then it scrapes targets, retrieves defined metrics, and performs required calculations.
Testing
- Setup MicroK8S cluster and Juju controller:
microk8s enable dns storage metallb:"10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111"
juju bootstrap microk8s uk8s
juju add-model test
- Deploy Prometheus and Jupyter Controller and relate them.
juju deploy prometheus-k8s --trust
juju deploy ./jupyter-controller_ubuntu-20.04-amd64.charm jupyter-controller --series kubernetes --trust --resource oci-image="docker.io/kubeflownotebookswg/notebook-controller:v1.6.1"
juju relate prometheus-k8s jupyter-controller
Final deployment should be:
Model Controller Cloud/Region Version SLA Timestamp
test uk8s microk8s/localhost 2.9.34 unsupported 09:26:08-05:00
App Version Status Scale Charm Channel Rev Address Exposed Message
jupyter-controller .../notebook-controller:v1.6.1 active 1 jupyter-controller 0 no
prometheus-k8s 2.33.5 active 1 prometheus-k8s stable 79 10.152.183.15 no
Unit Workload Agent Address Ports Message
jupyter-controller/0* active idle 10.1.59.86
prometheus-k8s/0* active idle 10.1.59.85
Relation provider Requirer Interface Type Message
jupyter-controller:metrics-endpoint prometheus-k8s:metrics-endpoint prometheus_scrape regular
prometheus-k8s:prometheus-peers prometheus-k8s:prometheus-peers prometheus_peers peer
- Navigate to Prometheus dashboard
https://<Prometheus-unit-IP>:9090
, select Status->Targets There should be Promethus scrape job that targets Jupyter Controller metrics endpoint (http://:8080/metrics) entry with no errors:
Received alerts/rules can also be verified under Alerts tab:
i-chvets commented
PR is merged, closing.