containers/conmon

Provide logging behavior policies applied by conmon to stdout/stderr

portante opened this issue · 18 comments

Today conmon is engaging in a byte-capture of stdout/stderr pipes, recording data as a series of JSON documents in a log file. All bytes from both file descriptors are captured without discrimination. We also see that conmon tries to write to disk as fast as possible (but sequentially) with what it reads from both pipes.

There is a natural back-pressure presented by conmon to the processes in the container writing to stdout/stderr that results from the rate the disk can accepts write()s. It is usually the case that all conmon processes server containers write to the same disk. Which means it is possible for "noisy" containers to take up much of the available disk bandwidth leading to unwanted impacts of one container on another.

Further, the rate at which containers write to disk is independent of the rate at which log file readers (or scrapers; e.g. fluentd, fluent-bit, rsyslog, syslog-ng, filebeat, etc.) can read. Because log file rotation is performed independent of the readers and conmon on most platforms, it is possible for conmon to write more data than what a reader can take in before a log file is deleted behind the reader's back with it ever seeing it.

These two problems suggest that log behavior policies applied by conmon to the log stream will give administrators the chance to solve them.

There are three policy behaviors we can begin to consider, each applied to stdout/stderr independently:

  1. back-pressure
  • Given a rate specified in "bytes per interval", stop reading from the given pipe if the number of bytes read so far exceeds the limit for that interval, continuing to read again at the beginning of the next interval
  • Suggested interval could just be 1 second by default, and/or allow it to be configured along with the # of bytes for that interval
  1. drop
  • Given a rate like specified for back-pressure, drop bytes read over the limit for the remainder of the interval, accepting them again at the start of the next interval
  1. ignore
  • Ignore all bytes read from a given pipe without writing them to disk.

This will allow an administrator to apply a policy to all conmon processes. Since conmon processes don't "know" about each other, coordination of the setting of that policy would have to be managed externally to conmon.

It is possible that conmon processes could periodically poll for configuration changes, or accept some sort of signal to re-evaluate the change.

@giuseppe Could you see about adding these flags.

I've a few questions.

  • Is ignore the equivalent of drop with a bytes-per-interval=0?

  • Are these limits supposed to be changed at runtime? e.g. can I change bytes-per-interval while the container is running?

  • How these settings will work for Podman/CRI-O? e.g. for CRi-O: should it be done globally or in an annotation?

mheon commented

I've a few questions.

* Is `ignore` the equivalent of `drop` with a bytes-per-interval=0?

@portante WDYT?

* Are these limits supposed to be changed at runtime?  e.g. can I change bytes-per-interval while the container is running?

No these should be constant for the container

* How these settings will work for Podman/CRI-O?  e.g. for CRi-O: should it be done globally or in an annotation?

CRI-O would need to use annotations until this gets support in upstream Kubernetes.
I agree with @mheon Podman would use --log-opt.

* How these settings will work for Podman/CRI-O?  e.g. for CRi-O: should it be done globally or in an annotation?

CRI-O would need to use annotations until this gets support in upstream Kubernetes.
I agree with @mheon Podman would use --log-opt.

While using annotations in CRI-O to set this would work, I'm not sure anyone would use it (us included) before it makes it upstream. conmon would still have to log in the old format to conform, and thus we'd have to have concurrent logging (the new and old way). It may make most sense to implement for podman, and propose in CRI, before we do work in CRI-O that may not be used or may not be really desired

Is ignore the equivalent of drop with a bytes-per-interval=0?

Certainly it could be implemented that way, but I was hoping we could make it so that at the --log-opt interface level drop with bytes-per-interval=0 is rejected. This way the intention is explicit instead of implicit: you want one of three behaviors, ignore (no logs at all), drop(logs if in bounds), back-pressure (all logs, with max bandwidth).

The other behavior we need is to be sure an SRE has the option to set the max bandwidth without the user increasing it, while still allowing the users to pick the behavior.

Example of the behavior described by @portante in https://bugzilla.redhat.com/show_bug.cgi?id=1741955. Container logging rates and container runtime log rotation policy can cause the logs to outstrip the log scraper's ability to keep up.

I'm adding two new log options to podman CLI in my POC, passed as part of --log-opt:

policy=backpressure|drop|ignore|passthrough

Peter has covered the first three, passthrough is current behavior with unrestricted logging.

rate-limit=RATE
Limit the transfer to a maximum of RATE bytes per second. A suffix of "K", "M", "G", or "T" can be
added to denote kibibytes (*1024), mebibytes, and so on.

@syedriko @portante Could one of you open a PR to add this feature?

@rhatdan A WIP PR: #92
Another PR in libpod for the CLI changes: containers/podman#4663

@syedriko I mean a PR to containers/common to add these limits.

what happens in the back-pressure case? the logs continue to accumulate in the pipe/buffer indefinitely(Assuming the log rate never drops)? can this cause memory consumption issues? who's memory budget does that come out of?

who's memory budget does that come out of?

By default, CRI-O puts conmon in the system.slice, but OCP ships with it in the pod slice, so it'd be charged to the pod

so it'd be charged to the pod

that sounds like the right place for it, so that's good.

what happens in the back-pressure case?

When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.

what happens in the back-pressure case?

When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.

One complication is that I think the logging rate should be specified separately for stdout from stderr. We should not lump the two together.

what happens in the back-pressure case?

When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.

One complication is that I think the logging rate should be specified separately for stdout from stderr. We should not lump the two together.

We can sure do that. With the PRs that are currently in-flight, I'm aiming at building a minimal working but end-to-end implementation we can play with, pick apart and make sure it makes sense.

@syedriko I mean a PR to containers/common to add these limits.

@rhatdan Could you give me a bit more details? I'm new to the containers code base and I'm not quite seeing how containers/common fits into what I'm doing.