Fluentd to external Elasticsearch stops sending logs

Question

Fluentd to external Elasticsearch stops sending logs

Closed this issue 4 years ago · 10 comments

After some issues to get Fluentd and ES stable (thank you Jeff), I updated Fluentd to send logs to external log server (Nagios Log Server - ELK stack). Logs will send for a day then stop. There are no apparent errors that I can find. Restarting the logging-fluent daemon set restarts (without issue) but only a portion of nodes will resume sending logs. Not sure if I should stop the ES and Kibana pods. (Won't lie, getting +32Gb back in memory resources sure would be nice!)

Logging-fluend appended configuration;

            - name: USE_REMOTE_SYSLOG
              value: 'true'
            - name: REMOTE_SYSLOG_HOST
              value: IP member 2
            - name: REMOTE_SYSLOG_HOST_BACKUP
              value: IP member 2
            - name: REMOTE_SYSLOG_PORT
              value: '5544'
            - name: REMOTE_SYSLOG_PORT_BACKUP
              value: '5544'
            - name: REMOTE_SYSLOG_TYPE
              value: syslog

I did notice the appended configuration wasn't happy with a round-robin lookup for REMOTE_SYSLOG_HOST, so it was updated with each member IP separately.

Answer 1 · 2020-12-16T21:56:35.000Z

What version of logging is this?

After some issues to get Fluentd and ES stable (thank you Jeff), I updated Fluentd to send logs to external log server (Nagios Log Server - ELK stack).

The external log server is an Elasticsearch cluster? If so, have your reached the watermark thresholds where it stops accepting data? This statement does not jive with enabling syslog unless they have some agent that accepts syslog anf further forwards to ES

Answer 2 · 2020-12-17T15:11:30.000Z

3.11
quay.io/openshift/origin-logging-fluentd v3.11.0 3643ce22b369

Watermark set at 90% and elasticsearch looks healthy
/var/log/elasticsearch/f49b1b36-537c-44f7-803e-47799a7e1bf4.log:[2020-12-16 23:04:48,115][INFO ][cluster.routing.allocation.decider] [6ec96358-bce3-4fab-be93-f4b8d844100d] updating [cluster.routing.allocation.disk.watermark.low] to [90%]

Answer 3 · 2020-12-18T16:33:06.000Z

Watermark set at 90% and elasticsearch looks healthy
/var/log/elasticsearch/f49b1b36-537c-44f7-803e-47799a7e1bf4.log:[2020-12-16 23:04:48,115][INFO ][cluster.routing.allocation.decider] [6ec96358-bce3-4fab-be93-f4b8d844100d] updating [cluster.routing.allocation.disk.watermark.low] to [90%]

Doesn't this speak to the setting not the actually amount of disk used?

Answer 4 · 2020-12-21T14:01:33.000Z

So I reverted it back to the OKD EFK stack. It's running, looks good but not getting the logs. Did notice this in the pod logs
'block' action stops input process until the buffer full is resolved
Followed the steps to clear the /var/lib/fluentd files. No change in Kibana

If I open the kibana-ops UI I see logs, Opening kibana UI, nothing.
Before I start playing with the filters, shouldn't default config collect the logs?

Answer 5 · 2020-12-21T14:04:51.000Z

To your previous about disk, though I've abandoned that for now, disk is fine, I'm using AWS storage through an AWS Storage Gateway Appliance with a 150GB cache. Monitoring that I'm not reaching anywhere near stressing it.

Answer 6 · 2021-03-21T17:31:39.000Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Answer 7 · 2021-03-23T14:11:18.000Z

/remove-lifecycle stale
Let me monitor this a bit longer and get back to you with any further concerns or comments. Thx!

Answer 8 · 2021-04-22T16:24:44.000Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Answer 9 · 2021-05-22T19:14:34.000Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Answer 10 · 2021-05-22T19:14:47.000Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.