kiwigrid/helm-charts

[fluentd-elasticsearch] error_class=Fluent::Plugin::ConcatFilter::TimeoutError is causing dropped messages

Closed this issue · 0 comments

Is this a request for help?:

No.

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug.

With current configuration fluentd is using concat plugin to merge messages into one.
Unfortunately this is not properly configured and results in some messages being dropped.

Message error example:

[warn]: dump an error event: error_class=Fluent::Plugin::ConcatFilter::TimeoutError error="Timeout flush: node-problem-detector:default" location=nil ...

This means some messages never reach elasticsearch, but they should - even if they are not fully parsed.

Pros:

  • getting all source messages
  • getting aware of issues that were not previosuly visible via elasticsearch

Cons:

  • some messages may be concatenated and still look ugly ;)
  • ignorance is no longer a bliss /s ;)

Version of Helm and Kubernetes:
helm 2.14.3
k8s 1.14.9-gke.2

Which chart in which version:
fluentd-elasticsearch
5.3.1

What happened:
Was trying to debug issues with some addons, and noticed that some specific logs are missing, but were available in the host (hello journalctl).
Also error message was quite frequent in the stern/wrecker output.
Then I discovered that those messages from errors are not available in elasticsearch at all. gasp

Example - try to use https://github.com/wercker/stern on your fluentd-elasticsearch pods and see how many messages you will see witht that error, find one, try to find specific text in elasticsearch ... vince-vega.gif

What you expected to happen:
If fluentd concat plugin generates error due to timeout in processing, then message still should be processed by other filters and thus end in elasticsearch (even if not fully parsed).

How to reproduce it (as minimally and precisely as possible):
use helm chart defaults

Anything else we need to know:

I can provide PRs with fixes.
This would actually implement https://github.com/fluent-plugins-nursery/fluent-plugin-concat#usage described in Handle timeout log lines the same as normal logs.