cluster-logging-operator crashes since update to 5.5.5
bo0ts opened this issue · 2 comments
bo0ts commented
After updating to 5.5.5
the cluster-logging-operator pod when into CrashLoopBackOff. We deleted all cluster logging related resources, restarted the pod and recreated two simple resources:
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
name: instance
namespace: openshift-logging
spec:
collection:
logs:
fluentd: {}
type: fluentd
managementState: Managed
---
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
outputs:
- name: central-loki
type: "loki"
url: ...
secret:
name: ...
loki:
labelKeys: [log_type, kubernetes.namespace_name, kubernetes.pod_name, kubernetes.host, kubernetes.container_name, tag, kubernetes.labels.application]
pipelines:
- name: all-logs
detectMultilineErrors: true
inputRefs:
- application
outputRefs:
- central-loki
After that the operator crashed again. This is the log from the crashed container:
{"_ts":"2022-12-16T15:00:43.754938909Z","_level":"0","_component":"cluster-logging-operator","_message":"starting up...","go_arch":"amd64","go_os":"linux","go_version":"go1.17.12","operator_version":"5.5"}
I1216 15:00:44.808441 1 request.go:665] Waited for 1.032755699s due to client-side throttling, not priority and fairness, request: GET:https://10.125.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s
{"_ts":"2022-12-16T15:00:47.263982151Z","_level":"0","_component":"cluster-logging-operator","_message":"migrating resources provided by the manifest"}
{"_ts":"2022-12-16T15:00:47.268424117Z","_level":"0","_component":"cluster-logging-operator","_message":"Registering Components."}
{"_ts":"2022-12-16T15:00:47.268802872Z","_level":"0","_component":"cluster-logging-operator","_message":"Starting the Cmd."}
I1216 15:01:03.310238 1 request.go:665] Waited for 1.045965295s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/redhatcop.redhat.io/v1alpha1?timeout=32s
I1216 15:01:13.359277 1 request.go:665] Waited for 1.096298438s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/acme.cert-manager.io/v1?timeout=32s
I1216 15:01:23.360119 1 request.go:665] Waited for 1.093319323s due to client-side throttling, not priority and fairness, request: GET:https://..:443/apis/operators.coreos.com/v2?timeout=32s
I1216 15:01:33.409059 1 request.go:665] Waited for 1.1461561s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/route.openshift.io/v1?timeout=32s
I1216 15:01:43.413097 1 request.go:665] Waited for 1.149125094s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/operator.openshift.io/v1?timeout=32s
I1216 15:01:53.459623 1 request.go:665] Waited for 1.195711459s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/kafka.strimzi.io/v1beta1?timeout=32s
I1216 15:02:03.459681 1 request.go:665] Waited for 1.197635503s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/snapshot.storage.k8s.io/v1?timeout=32s
I1216 15:02:13.460036 1 request.go:665] Waited for 1.193477284s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/apps/v1?timeout=32s
I1216 15:02:23.509903 1 request.go:665] Waited for 1.24708191s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/console.openshift.io/v1?timeout=32s
{"_ts":"2022-12-16T15:02:26.120857569Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller error updating status","_error":{"msg":"Operation cannot be fulfilled on clusterlogforwarders.logging.openshift.io \"instance\": the object has been modified; please apply your changes to the latest version and try again"}}
I1216 15:02:33.525822 1 request.go:665] Waited for 1.149243647s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/template.openshift.io/v1?timeout=32s
I1216 15:02:43.560108 1 request.go:665] Waited for 1.295375528s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/cloudcredential.openshift.io/v1?timeout=32s
I1216 15:02:53.609422 1 request.go:665] Waited for 1.346344689s due to client-side throttling, not priority and fairness, request: GET:https://...:443/apis/metal3.io/v1alpha1?timeout=32s
{"_ts":"2022-12-16T15:03:14.781236971Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller error updating status","_error":{"msg":"Operation cannot be fulfilled on clusterlogforwarders.logging.openshift.io \"instance\": the object has been modified; please apply your changes to the latest version and try again"}}
{"_ts":"2022-12-16T15:03:14.781493976Z","_level":"0","_component":"cluster-logging-operator","_message":"Manager exited non-zero","_error":{"msg":"failed to wait for clusterlogging caches to sync: timed out waiting for cache to be synced"}}
Environment
- OpenShift 4.9
- cluster-logging-operator 5.5.5
Kadams64 commented
Same problem here; OCP 4.11; cluster-logging-operator 5.5.5, exact same failure message i.e.:
failed to wait for clusterlogging caches to sync: timed out waiting for cache to be synced
Note that this was not occurring prior to the update to 5.5.5 (via operator subscription)