cluster-logging-operator crashes since update to 5.5.5

Question

cluster-logging-operator crashes since update to 5.5.5

bo0ts opened this issue 2 years ago · 2 comments

After updating to 5.5.5 the cluster-logging-operator pod when into CrashLoopBackOff. We deleted all cluster logging related resources, restarted the pod and recreated two simple resources:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
spec:
  collection:
    logs:
      fluentd: {}
      type: fluentd
  managementState: Managed
---
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
   - name: central-loki 
     type: "loki" 
     url: ...
     secret:
       name: ...
     loki:
      labelKeys: [log_type, kubernetes.namespace_name, kubernetes.pod_name, kubernetes.host, kubernetes.container_name, tag, kubernetes.labels.application]
  pipelines:
   - name: all-logs
     detectMultilineErrors: true
     inputRefs:
      - application
     outputRefs:
      - central-loki

After that the operator crashed again. This is the log from the crashed container:

{"_ts":"2022-12-16T15:00:43.754938909Z","_level":"0","_component":"cluster-logging-operator","_message":"starting up...","go_arch":"amd64","go_os":"linux","go_version":"go1.17.12","operator_version":"5.5"}
I1216 15:00:44.808441       1 request.go:665] Waited for 1.032755699s due to client-side throttling, not priority and fairness, request: GET:https://10.125.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s
{"_ts":"2022-12-16T15:00:47.263982151Z","_level":"0","_component":"cluster-logging-operator","_message":"migrating resources provided by the manifest"}
{"_ts":"2022-12-16T15:00:47.268424117Z","_level":"0","_component":"cluster-logging-operator","_message":"Registering Components."}
{"_ts":"2022-12-16T15:00:47.268802872Z","_level":"0","_component":"cluster-logging-operator","_message":"Starting the Cmd."}
I1216 15:01:03.310238       1 request.go:665] Waited for 1.045965295s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/redhatcop.redhat.io/v1alpha1?timeout=32s
I1216 15:01:13.359277       1 request.go:665] Waited for 1.096298438s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/acme.cert-manager.io/v1?timeout=32s
I1216 15:01:23.360119       1 request.go:665] Waited for 1.093319323s due to client-side throttling, not priority and fairness, request: GET:https://..:443/apis/operators.coreos.com/v2?timeout=32s
I1216 15:01:33.409059       1 request.go:665] Waited for 1.1461561s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/route.openshift.io/v1?timeout=32s
I1216 15:01:43.413097       1 request.go:665] Waited for 1.149125094s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/operator.openshift.io/v1?timeout=32s
I1216 15:01:53.459623       1 request.go:665] Waited for 1.195711459s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/kafka.strimzi.io/v1beta1?timeout=32s
I1216 15:02:03.459681       1 request.go:665] Waited for 1.197635503s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/snapshot.storage.k8s.io/v1?timeout=32s
I1216 15:02:13.460036       1 request.go:665] Waited for 1.193477284s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/apps/v1?timeout=32s
I1216 15:02:23.509903       1 request.go:665] Waited for 1.24708191s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/console.openshift.io/v1?timeout=32s
{"_ts":"2022-12-16T15:02:26.120857569Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller error updating status","_error":{"msg":"Operation cannot be fulfilled on clusterlogforwarders.logging.openshift.io \"instance\": the object has been modified; please apply your changes to the latest version and try again"}}
I1216 15:02:33.525822       1 request.go:665] Waited for 1.149243647s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/template.openshift.io/v1?timeout=32s
I1216 15:02:43.560108       1 request.go:665] Waited for 1.295375528s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/cloudcredential.openshift.io/v1?timeout=32s
I1216 15:02:53.609422       1 request.go:665] Waited for 1.346344689s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/metal3.io/v1alpha1?timeout=32s
{"_ts":"2022-12-16T15:03:14.781236971Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller error updating status","_error":{"msg":"Operation cannot be fulfilled on clusterlogforwarders.logging.openshift.io \"instance\": the object has been modified; please apply your changes to the latest version and try again"}}
{"_ts":"2022-12-16T15:03:14.781493976Z","_level":"0","_component":"cluster-logging-operator","_message":"Manager exited non-zero","_error":{"msg":"failed to wait for clusterlogging caches to sync: timed out waiting for cache to be synced"}}

Environment

OpenShift 4.9
cluster-logging-operator 5.5.5

Answer 1 · 2022-12-19T17:49:06.000Z

Same problem here; OCP 4.11; cluster-logging-operator 5.5.5, exact same failure message i.e.:

failed to wait for clusterlogging caches to sync: timed out waiting for cache to be synced

Note that this was not occurring prior to the update to 5.5.5 (via operator subscription)

Answer 2 · 2022-12-20T20:53:50.000Z

https://issues.redhat.com/browse/LOG-3428