Provide logging behavior policies applied by conmon to stdout/stderr
portante opened this issue · 18 comments
Today conmon
is engaging in a byte-capture of stdout
/stderr
pipes, recording data as a series of JSON documents in a log file. All bytes from both file descriptors are captured without discrimination. We also see that conmon
tries to write to disk as fast as possible (but sequentially) with what it reads from both pipes.
There is a natural back-pressure presented by conmon
to the processes in the container writing to stdout
/stderr
that results from the rate the disk can accepts write()
s. It is usually the case that all conmon
processes server containers write to the same disk. Which means it is possible for "noisy" containers to take up much of the available disk bandwidth leading to unwanted impacts of one container on another.
Further, the rate at which containers write to disk is independent of the rate at which log file readers (or scrapers; e.g. fluentd, fluent-bit, rsyslog, syslog-ng, filebeat, etc.) can read. Because log file rotation is performed independent of the readers and conmon
on most platforms, it is possible for conmon
to write more data than what a reader can take in before a log file is deleted behind the reader's back with it ever seeing it.
These two problems suggest that log behavior policies applied by conmon
to the log stream will give administrators the chance to solve them.
There are three policy behaviors we can begin to consider, each applied to stdout
/stderr
independently:
back-pressure
- Given a rate specified in "bytes per interval", stop reading from the given pipe if the number of bytes read so far exceeds the limit for that interval, continuing to read again at the beginning of the next interval
- Suggested interval could just be 1 second by default, and/or allow it to be configured along with the # of bytes for that interval
drop
- Given a rate like specified for
back-pressure
, drop bytes read over the limit for the remainder of the interval, accepting them again at the start of the next interval
ignore
- Ignore all bytes read from a given pipe without writing them to disk.
This will allow an administrator to apply a policy to all conmon
processes. Since conmon
processes don't "know" about each other, coordination of the setting of that policy would have to be managed externally to conmon
.
It is possible that conmon
processes could periodically poll for configuration changes, or accept some sort of signal to re-evaluate the change.
I've a few questions.
-
Is
ignore
the equivalent ofdrop
with a bytes-per-interval=0? -
Are these limits supposed to be changed at runtime? e.g. can I change bytes-per-interval while the container is running?
-
How these settings will work for Podman/CRI-O? e.g. for CRi-O: should it be done globally or in an annotation?
I've a few questions.
* Is `ignore` the equivalent of `drop` with a bytes-per-interval=0?
@portante WDYT?
* Are these limits supposed to be changed at runtime? e.g. can I change bytes-per-interval while the container is running?
No these should be constant for the container
* How these settings will work for Podman/CRI-O? e.g. for CRi-O: should it be done globally or in an annotation?
CRI-O would need to use annotations until this gets support in upstream Kubernetes.
I agree with @mheon Podman would use --log-opt.
* How these settings will work for Podman/CRI-O? e.g. for CRi-O: should it be done globally or in an annotation?
CRI-O would need to use annotations until this gets support in upstream Kubernetes.
I agree with @mheon Podman would use --log-opt.
While using annotations in CRI-O to set this would work, I'm not sure anyone would use it (us included) before it makes it upstream. conmon would still have to log in the old format to conform, and thus we'd have to have concurrent logging (the new and old way). It may make most sense to implement for podman, and propose in CRI, before we do work in CRI-O that may not be used or may not be really desired
Is ignore the equivalent of drop with a bytes-per-interval=0?
Certainly it could be implemented that way, but I was hoping we could make it so that at the --log-opt
interface level drop
with bytes-per-interval=0
is rejected. This way the intention is explicit instead of implicit: you want one of three behaviors, ignore
(no logs at all), drop
(logs if in bounds), back-pressure
(all logs, with max bandwidth).
The other behavior we need is to be sure an SRE has the option to set the max bandwidth without the user increasing it, while still allowing the users to pick the behavior.
Example of the behavior described by @portante in https://bugzilla.redhat.com/show_bug.cgi?id=1741955. Container logging rates and container runtime log rotation policy can cause the logs to outstrip the log scraper's ability to keep up.
I'm adding two new log options to podman CLI in my POC, passed as part of --log-opt:
policy=backpressure|drop|ignore|passthrough
Peter has covered the first three, passthrough is current behavior with unrestricted logging.
rate-limit=RATE
Limit the transfer to a maximum of RATE bytes per second. A suffix of "K", "M", "G", or "T" can be
added to denote kibibytes (*1024), mebibytes, and so on.
@rhatdan A WIP PR: #92
Another PR in libpod for the CLI changes: containers/podman#4663
what happens in the back-pressure case? the logs continue to accumulate in the pipe/buffer indefinitely(Assuming the log rate never drops)? can this cause memory consumption issues? who's memory budget does that come out of?
who's memory budget does that come out of?
By default, CRI-O puts conmon in the system.slice, but OCP ships with it in the pod slice, so it'd be charged to the pod
so it'd be charged to the pod
that sounds like the right place for it, so that's good.
what happens in the back-pressure case?
When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.
what happens in the back-pressure case?
When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.
One complication is that I think the logging rate should be specified separately for stdout from stderr. We should not lump the two together.
what happens in the back-pressure case?
When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.
One complication is that I think the logging rate should be specified separately for stdout from stderr. We should not lump the two together.
We can sure do that. With the PRs that are currently in-flight, I'm aiming at building a minimal working but end-to-end implementation we can play with, pick apart and make sure it makes sense.