Logging - posibility of losing logs
Closed this issue · 7 comments
Let's say I have container which logs massively. Supported configuration from RedHat uses JSON files on /var/log/containers. But it will eventually eat all filesystem because those logs are deleted after pod deletion. One way to combat this situation is to use max-size.
Let's imagine this scenario (for demonstration, log entry will have 1 MB and max-size is 50MB):
- container's log on node has 49.5 MB, fluend position is at EOF
- container logs 1 MB
- current log on node has 50.5 MB, fluend reads and tries to forward to ES but some problem happens (network failure, ES down, whatever) -> so no data has been sent
- container logs 1 MB
- docker daemon checks for max-size of logging file before writing => so it truncates file to 0 (https://github.com/moby/moby/blob/77faf158f5265711dbcbff0ffb855eed2e3b6ccd/daemon/logger/loggerutils/logfile.go#L174)
Same idea applies for dead containers, k8s GC could have deleted dead containers before sending data to ES (maximum-dead-containers-per-container, default value is 1).
Is there any way to truncate/rotate/delete logs from nodes based on acknowledgment from fluentd that those data has been successfully sent or any idea how to get it working 100% and not to lose a single log line?
Is there any way to truncate/rotate/delete logs from nodes based on acknowledgment from fluentd that those data has been successfully sent or any idea how to get it working 100% and not to lose a single log line?
I don't know, but I would like to know. Have you tried asking the upstream fluentd community?
Note that OpenShift 4.x uses CRI-O instead of docker - CRI-O has max-size and rotation parameters - not sure how to configure them.
Note that logging 4.2 will support rsyslog in addition to fluentd.
@alanconway is there work here to be done on the collector side to resolve this or is this purely related to the runtime work you started?
@camabeh
Is there any way to truncate/rotate/delete logs from nodes based on acknowledgment from fluentd that those data has been successfully sent or any idea how to get it working 100% and not to lose a single log line?
We strive to collect all logs from the system but we make no guarantees
This problem would likely be solved by a solution like the one proposed for conmon [1].
Closing issue to be resolved by impl of containers/conmon#84