openshift/cluster-logging-operator

Eliminate the Promtheus operator dependency of the "arbitraryFSAccessThroughSMs"

bysnupy opened this issue · 4 comments

Describe the bug
Use the service serving CA in generated service serving secret instead of the CA cert on the Prometheus Operator local container filesystem.

  • Description on issues:
    Currently, Cluster Logging Operator assumes only running with Prometheus Operator in "openshift-monitoring", not "openshift-user-workload-monitoring" one.
    However Cluster Logging Operator(EFK) is a user worload treatment on ROSA, and also not providing openshift.io/cluster-monitoring: "true" label in "openshift-logging".
    The user workload Prometheues Operator in "openshift-user-workload-monitoring" does not allow to access the prometheus local FS,
    as a result "fluentd" ServiceMonitor with "/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt" always is failed to access.\

// cluster monitoring prometheus allow to access own local fs.

$ oc get prometheus k8s -o yaml -n openshift-monitoring
:
  arbitraryFSAccessThroughSMs: {}

// In contrast, user workload monitoring prometheus does not allow to access own local fs.

$ oc get prometheus user-workload -o yaml -n openshift-user-workload-monitoring
:
  arbitraryFSAccessThroughSMs:
    deny: true

It causes incorrect alert notification as follows,

https://issues.redhat.com/browse/LOG-1561

And service serving cert is consist of CA and server cert together basically[0],
I think we can get the same CA from generated service serving cert secret, "/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt" is duplicated with that.
Additionally getting CA cert from local fs is not secure than extracting the CA cert through secret and it depends on external prometheus CA cert too.

As a result, the following config would work well to communicate with any prometheus operator.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: fluentd
  namespace: openshift-logging
spec:
  endpoints:
  - path: /metrics
    port: metrics
    scheme: https
    tlsConfig:
      ca:
        secret:
          key: tls.crt
          name: fluentd-metrics
      serverName: fluentd.openshift-logging.svc
  jobLabel: monitor-fluentd
  namespaceSelector:
    matchNames:
    - openshift-logging
  selector:
    matchLabels:
      logging-infra: support

[0] including the CA of the service serving certificate and its server cert together as follows.
https://github.com/openshift/library-go/blob/70a39c8ba7a12c5903391c0a3ae7849c0fb4af73/pkg/crypto/crypto.go#L755-L758

server := &TLSCertificateConfig{
  Certs: append([]*x509.Certificate{serverCrt}, ca.Config.Certs...),
  Key:   serverPrivateKey,
}

Environment

  • ROSA (v4.7.z)
  • ClusterLogging instance

Logs

level=warn ts=2021-07-02T02:39:00.701373761Z caller=operator.go:1675 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via tls config which Prometheus specification prohibits" servicemonitor=openshift-logging/fluentd namespace=openshift-user-workload-monitoring prometheus=user-workload

Expected behavior
The Prometheus operator in "openshift-user-workload-monitoring" can communicate with "fluentd" target through ServiceMonitor.

Actual behavior
Evaluated the invalid servicemonitor TLSconfig and firing incorrect alert.

To Reproduce
After removing openshift.io/cluster-monitoring: "true" label from "openshift-logging", install cluster logging operator(EFK).
Then you can reproduce this issue as always.

Additional context
Add any other context about the problem here.

Closing. The CLO does not control the "prometheus operator"

@jcantrill I mean CLO does not need /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt for secure access. Why does CLO servicemonitor require service-ca.crt which is placed on Prometheus pod ? Own service serving cert has the same CA and concerned server cert together in advance, it's enough to secure access.

@jcantrill I mean CLO does not need /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt for secure access. Why does CLO servicemonitor require service-ca.crt which is placed on Prometheus pod ? Own service serving cert has the same CA and concerned server cert together in advance, it's enough to secure access.

I'm not following your question. The CLO desires for prometheus to scrape metrics of its various components. It is required to do so in a secure manor. The CLO uses the cert signing service of the cluster to request a set of certificates that:

  • prometheus uses to scrape the endpoint
  • that various components use to guard the endpoint
    It additionally creates a ServiceMonitor to do what is required to "tell" prometheus what needs to be scraped

This is how we were advised several release ago to get our metrics into prometheus. I can not speak to why the prometheus-operator does what it does or why this is the workflow that is needed. The CLO has nothing directly to do with the prometheus-operator. I urge you to reach out to that community to resolve your needs.

@jcantrill Thank you for your clarification. And I'm sorry for my poor explanations.
Let me please give me more chance to explain the details again.

As you mentioned, ServiceMonitor provides requirements to the prometheus.
In this case, CLO creates the following ServiceMonitor for scraping "fluentd-metrics" target. In other words, CLO tells the prometheus; "use your service serving CA cert on local fs" through the ServiecMonitor.

And I think CLO instructs prometheus-operator directly how to scrape the fluentd metrics, so I'd filed this issue here.

endpoint := monitoringv1.Endpoint{
Port: metricsPortName,
Path: "/metrics",
Scheme: "https",
TLSConfig: &monitoringv1.TLSConfig{
CAFile: prometheusCAFile,
ServerName: fmt.Sprintf("%s.%s.svc", fluentdName, cluster.Namespace),
// ServerName can be e.g. fluentd.openshift-logging.svc
},
}

	endpoint := monitoringv1.Endpoint{
		Port:   metricsPortName,
		Path:   "/metrics",
		Scheme: "https",
		TLSConfig: &monitoringv1.TLSConfig{
			CAFile:     prometheusCAFile,   <--- Pointed "/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt" on the prometheus.
			ServerName: fmt.Sprintf("%s.%s.svc", fluentdName, cluster.Namespace),
			// ServerName can be e.g. fluentd.openshift-logging.svc
		},
	}

And this issue can resolve using CA parameter instead of CAFile which is for referring the prometheus CA cert as follows. The CAs are basically the same each other. Could you please consider this ? That's what I'm talking about here.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: fluentd
  namespace: openshift-logging
spec:
  endpoints:
  - path: /metrics
    port: metrics
    scheme: https
    tlsConfig:
      ca:
        secret:
          key: tls.crt
          name: fluentd-metrics   <--- This service serving cert secret is created by fluentd service defined by CLO.
      serverName: fluentd.openshift-logging.svc
  jobLabel: monitor-fluentd
  namespaceSelector:
    matchNames:
    - openshift-logging
  selector:
    matchLabels:
      logging-infra: support