sustainable-computing-io/kepler-doc

Installation on Prometheus Operator on Openshift can break Cluster Monitoring

Opened this issue · 2 comments

sthaha commented

The Deploy section of the Kepler Doc recommends installing Prometheus Operator. This would install 2 instances of Prometheus Operators that if not properly configured can render the cluster's in-platform monitoring unusable as the new Prometheus Operator can reconcile the prometheus-k8s in openshift-monitoring namespace.

Did you install Prometheus Operator in the openshift-monitoring or monitoring namespace?

sthaha commented

Did you install Prometheus Operator in the openshift-monitoring or monitoring namespace?

Neither, but I were to do it, I wouldn't touch openshift-monitoring ns

I did not install Prometheus Operator ( PO ) since I was pretty sure (based on previous experience) looking at the deployment yaml that PO in kube-prometheus does not limit the resources it watches
see: https://github.com/prometheus-operator/kube-prometheus/blob/dc0ad5e2162110c31c0c08d097f688145ce8e229/manifests/prometheusOperator-deployment.yaml#L29

      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.65.2
        image: quay.io/prometheus-operator/prometheus-operator:v0.65.2

Unlike the incluster Prometheus Operator that does https://github.com/openshift/cluster-monitoring-operator/blob/076da3ba2d27edb00765cff6a51b0b7a2785ce03/assets/prometheus-operator/deployment.yaml#LL34C8-L38C66

      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.65.1
        - --prometheus-instance-namespaces=openshift-monitoring
        - --thanos-ruler-instance-namespaces=openshift-monitoring
        - --alertmanager-instance-namespaces=openshift-monitoring

This would lead to the PO reconciling all Prometheus instances including the incluster one which renders the in cluster monitoring Prometheus unstable as both operators start to update it.

I think it may be easier to use user-workload-monitoring on OpenShift - https://docs.openshift.com/container-platform/4.13/monitoring/configuring-the-monitoring-stack.html