Eliminate the Promtheus operator dependency of the "arbitraryFSAccessThroughSMs"
bysnupy opened this issue · 4 comments
Describe the bug
Use the service serving CA in generated service serving secret instead of the CA cert on the Prometheus Operator local container filesystem.
- Description on issues:
Currently, Cluster Logging Operator assumes only running with Prometheus Operator in "openshift-monitoring", not "openshift-user-workload-monitoring" one.
However Cluster Logging Operator(EFK) is a user worload treatment on ROSA, and also not providingopenshift.io/cluster-monitoring: "true"
label in "openshift-logging".
The user workload Prometheues Operator in "openshift-user-workload-monitoring" does not allow to access the prometheus local FS,
as a result "fluentd" ServiceMonitor with "/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt" always is failed to access.\
// cluster monitoring prometheus allow to access own local fs.
$ oc get prometheus k8s -o yaml -n openshift-monitoring
:
arbitraryFSAccessThroughSMs: {}
// In contrast, user workload monitoring prometheus does not allow to access own local fs.
$ oc get prometheus user-workload -o yaml -n openshift-user-workload-monitoring
:
arbitraryFSAccessThroughSMs:
deny: true
It causes incorrect alert notification as follows,
https://issues.redhat.com/browse/LOG-1561
And service serving cert is consist of CA and server cert together basically[0],
I think we can get the same CA from generated service serving cert secret, "/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt" is duplicated with that.
Additionally getting CA cert from local fs is not secure than extracting the CA cert through secret and it depends on external prometheus CA cert too.
As a result, the following config would work well to communicate with any prometheus operator.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: fluentd
namespace: openshift-logging
spec:
endpoints:
- path: /metrics
port: metrics
scheme: https
tlsConfig:
ca:
secret:
key: tls.crt
name: fluentd-metrics
serverName: fluentd.openshift-logging.svc
jobLabel: monitor-fluentd
namespaceSelector:
matchNames:
- openshift-logging
selector:
matchLabels:
logging-infra: support
[0] including the CA of the service serving certificate and its server cert together as follows.
https://github.com/openshift/library-go/blob/70a39c8ba7a12c5903391c0a3ae7849c0fb4af73/pkg/crypto/crypto.go#L755-L758
server := &TLSCertificateConfig{
Certs: append([]*x509.Certificate{serverCrt}, ca.Config.Certs...),
Key: serverPrivateKey,
}
Environment
- ROSA (v4.7.z)
- ClusterLogging instance
Logs
level=warn ts=2021-07-02T02:39:00.701373761Z caller=operator.go:1675 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via tls config which Prometheus specification prohibits" servicemonitor=openshift-logging/fluentd namespace=openshift-user-workload-monitoring prometheus=user-workload
Expected behavior
The Prometheus operator in "openshift-user-workload-monitoring" can communicate with "fluentd" target through ServiceMonitor.
Actual behavior
Evaluated the invalid servicemonitor TLSconfig and firing incorrect alert.
To Reproduce
After removing openshift.io/cluster-monitoring: "true"
label from "openshift-logging", install cluster logging operator(EFK).
Then you can reproduce this issue as always.
Additional context
Add any other context about the problem here.
Closing. The CLO does not control the "prometheus operator"
@jcantrill I mean CLO does not need /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
for secure access. Why does CLO servicemonitor require service-ca.crt
which is placed on Prometheus pod ? Own service serving cert has the same CA and concerned server cert together in advance, it's enough to secure access.
@jcantrill I mean CLO does not need
/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
for secure access. Why does CLO servicemonitor requireservice-ca.crt
which is placed on Prometheus pod ? Own service serving cert has the same CA and concerned server cert together in advance, it's enough to secure access.
I'm not following your question. The CLO desires for prometheus to scrape metrics of its various components. It is required to do so in a secure manor. The CLO uses the cert signing service of the cluster to request a set of certificates that:
- prometheus uses to scrape the endpoint
- that various components use to guard the endpoint
It additionally creates a ServiceMonitor to do what is required to "tell" prometheus what needs to be scraped
This is how we were advised several release ago to get our metrics into prometheus. I can not speak to why the prometheus-operator does what it does or why this is the workflow that is needed. The CLO has nothing directly to do with the prometheus-operator. I urge you to reach out to that community to resolve your needs.
@jcantrill Thank you for your clarification. And I'm sorry for my poor explanations.
Let me please give me more chance to explain the details again.
As you mentioned, ServiceMonitor provides requirements to the prometheus.
In this case, CLO creates the following ServiceMonitor for scraping "fluentd-metrics" target. In other words, CLO tells the prometheus; "use your service serving CA cert on local fs" through the ServiecMonitor.
And I think CLO instructs prometheus-operator directly how to scrape the fluentd metrics, so I'd filed this issue here.
cluster-logging-operator/pkg/k8shandler/fluentd.go
Lines 135 to 144 in dfa06c2
endpoint := monitoringv1.Endpoint{
Port: metricsPortName,
Path: "/metrics",
Scheme: "https",
TLSConfig: &monitoringv1.TLSConfig{
CAFile: prometheusCAFile, <--- Pointed "/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt" on the prometheus.
ServerName: fmt.Sprintf("%s.%s.svc", fluentdName, cluster.Namespace),
// ServerName can be e.g. fluentd.openshift-logging.svc
},
}
And this issue can resolve using CA parameter instead of CAFile which is for referring the prometheus CA cert as follows. The CAs are basically the same each other. Could you please consider this ? That's what I'm talking about here.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: fluentd
namespace: openshift-logging
spec:
endpoints:
- path: /metrics
port: metrics
scheme: https
tlsConfig:
ca:
secret:
key: tls.crt
name: fluentd-metrics <--- This service serving cert secret is created by fluentd service defined by CLO.
serverName: fluentd.openshift-logging.svc
jobLabel: monitor-fluentd
namespaceSelector:
matchNames:
- openshift-logging
selector:
matchLabels:
logging-infra: support