openshift/cluster-logging-operator

Collector pods don't restart automatically once stuck on error

Closed this issue · 1 comments

Describe the bug
Hello,

I am using cluster logging operator 5.x using vector on OKD for forwarding logs to an external SIEM solution. However, if the external SIEM goes down and comes back, the collector pod is staying in error status "connection error with output" until restarted manually. Is there a way to add health check to collector daemonset so that it comes back automatically if the output for clusterLogForwarder is up.

Environment

  • Versions of OpenShift, Cluster Logging and any other relevant components
  • ClusterLogging instance

OKD 4.13
Operator version 5.5

Logs
I restarted the pod and could not save error logs

Expected behavior
The collector pods should start resending the logs to output once the output is online .

Actual behavior
The collector pods stuck on error status and need manual restart.

To Reproduce
Steps to reproduce the behavior:

  1. Make the output ( to which you are forwarding logs) down for some time.
  2. Observe the error logs in collector pods
  3. Make the output up and running and see the collector pod logs stuck on error status

Additional context
Add any other context about the problem here.

Closing as obsolete. Logging 5.5 used an earlier version of vector. I encourage you to upgrade to the lastest which brings in a number of improvements