Fluentd to external Elasticsearch stops sending logs
Closed this issue · 10 comments
After some issues to get Fluentd and ES stable (thank you Jeff), I updated Fluentd to send logs to external log server (Nagios Log Server - ELK stack). Logs will send for a day then stop. There are no apparent errors that I can find. Restarting the logging-fluent daemon set restarts (without issue) but only a portion of nodes will resume sending logs. Not sure if I should stop the ES and Kibana pods. (Won't lie, getting +32Gb back in memory resources sure would be nice!)
Logging-fluend appended configuration;
- name: USE_REMOTE_SYSLOG
value: 'true'
- name: REMOTE_SYSLOG_HOST
value: IP member 2
- name: REMOTE_SYSLOG_HOST_BACKUP
value: IP member 2
- name: REMOTE_SYSLOG_PORT
value: '5544'
- name: REMOTE_SYSLOG_PORT_BACKUP
value: '5544'
- name: REMOTE_SYSLOG_TYPE
value: syslog
I did notice the appended configuration wasn't happy with a round-robin lookup for REMOTE_SYSLOG_HOST, so it was updated with each member IP separately.
What version of logging is this?
After some issues to get Fluentd and ES stable (thank you Jeff), I updated Fluentd to send logs to external log server (Nagios Log Server - ELK stack).
The external log server is an Elasticsearch cluster? If so, have your reached the watermark thresholds where it stops accepting data? This statement does not jive with enabling syslog unless they have some agent that accepts syslog anf further forwards to ES
3.11
quay.io/openshift/origin-logging-fluentd v3.11.0 3643ce22b369
Watermark set at 90% and elasticsearch looks healthy
/var/log/elasticsearch/f49b1b36-537c-44f7-803e-47799a7e1bf4.log:[2020-12-16 23:04:48,115][INFO ][cluster.routing.allocation.decider] [6ec96358-bce3-4fab-be93-f4b8d844100d] updating [cluster.routing.allocation.disk.watermark.low] to [90%]
Watermark set at 90% and elasticsearch looks healthy
/var/log/elasticsearch/f49b1b36-537c-44f7-803e-47799a7e1bf4.log:[2020-12-16 23:04:48,115][INFO ][cluster.routing.allocation.decider] [6ec96358-bce3-4fab-be93-f4b8d844100d] updating [cluster.routing.allocation.disk.watermark.low] to [90%]
Doesn't this speak to the setting not the actually amount of disk used?
So I reverted it back to the OKD EFK stack. It's running, looks good but not getting the logs. Did notice this in the pod logs
'block' action stops input process until the buffer full is resolved
Followed the steps to clear the /var/lib/fluentd files. No change in Kibana
If I open the kibana-ops UI I see logs, Opening kibana UI, nothing.
Before I start playing with the filters, shouldn't default config collect the logs?
To your previous about disk, though I've abandoned that for now, disk is fine, I'm using AWS storage through an AWS Storage Gateway Appliance with a 150GB cache. Monitoring that I'm not reaching anywhere near stressing it.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Let me monitor this a bit longer and get back to you with any further concerns or comments. Thx!
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen
.
Mark the issue as fresh by commenting/remove-lifecycle rotten
.
Exclude this issue from closing again by commenting/lifecycle frozen
./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.