tsaikd/gogstash

Back pressure handling document

Opened this issue · 3 comments

Hi

I would like to know about if output is not available (ex : elasticsearch full disk),
what will happen?
What is the buffering mechanism and is there any parameters (like max_disk_buffer_size, max_memory_buffer_size)?

Thanks

Depends on olivere library configuration, you can check the details there.

My guess is it will retry by Backoff algorithm until some special status codes.

If you look at https://github.com/tsaikd/gogstash/blob/master/output/elastic/outputelastic.go you will see the default parameters buffer sizes. This is used during normal operations.

In case of issues one of three things can happen.

  1. The message is not accepted and dropped.
  2. The message is declined and requed. By default, see line 21, it will retry messages where Elasticsearch returns with HTTP status codes 408, 429, 503, 507.
  3. The message is not delivered and remain in the queue.

From what I can see what will happen is that - over time under an error condition with new messages coming in - the output will consume all memory and gogstash will crash. I can't find anywhere that a message is dropped when not delivered. The parameter ExponentialBackoffMaxTimeout does not seem to drop messages over time.

I tested with two error conditions; Elasticsearch down and Elasticsearch in read-only mode (high watermark/disk util).

Should we discuss implementing some kind of backpressure mechanism where the output can pause the input. Many inputs can support this while others have issues with it.

You can have many strategies for handling delivering errors, and in my opinion, an important thing for a logging system is keeping the logs integrity. If you drop messages somewhere, you might miss a chance to be notified for the unstable system state. Case by case, sometimes you may want to drop messages to prevent memory exhaustion.