Logging - posibility of losing logs

Question

Logging - posibility of losing logs

Closed this issue 5 years ago · 7 comments

Let's say I have container which logs massively. Supported configuration from RedHat uses JSON files on /var/log/containers. But it will eventually eat all filesystem because those logs are deleted after pod deletion. One way to combat this situation is to use max-size.

Let's imagine this scenario (for demonstration, log entry will have 1 MB and max-size is 50MB):

container's log on node has 49.5 MB, fluend position is at EOF
container logs 1 MB
current log on node has 50.5 MB, fluend reads and tries to forward to ES but some problem happens (network failure, ES down, whatever) -> so no data has been sent
container logs 1 MB
docker daemon checks for max-size of logging file before writing => so it truncates file to 0 (https://github.com/moby/moby/blob/77faf158f5265711dbcbff0ffb855eed2e3b6ccd/daemon/logger/loggerutils/logfile.go#L174)

Same idea applies for dead containers, k8s GC could have deleted dead containers before sending data to ES (maximum-dead-containers-per-container, default value is 1).

Is there any way to truncate/rotate/delete logs from nodes based on acknowledgment from fluentd that those data has been successfully sent or any idea how to get it working 100% and not to lose a single log line?

Answer 1 · 2019-05-10T21:14:24.000Z

Is there any way to truncate/rotate/delete logs from nodes based on acknowledgment from fluentd that those data has been successfully sent or any idea how to get it working 100% and not to lose a single log line?

I don't know, but I would like to know. Have you tried asking the upstream fluentd community?

Note that OpenShift 4.x uses CRI-O instead of docker - CRI-O has max-size and rotation parameters - not sure how to configure them.

Note that logging 4.2 will support rsyslog in addition to fluentd.

Answer 2 · 2019-08-30T16:59:42.000Z

@portante I think this is related to what you have been investigating.

Answer 3 · 2020-01-02T14:43:48.000Z

@alanconway is there work here to be done on the collector side to resolve this or is this purely related to the runtime work you started?

Answer 4 · 2020-01-02T14:45:19.000Z

@camabeh

Is there any way to truncate/rotate/delete logs from nodes based on acknowledgment from fluentd that those data has been successfully sent or any idea how to get it working 100% and not to lose a single log line?

We strive to collect all logs from the system but we make no guarantees

Answer 5 · 2020-01-03T18:55:18.000Z

On Thu, Jan 2, 2020 at 9:43 AM Jeff Cantrill ***@***.***> wrote: @alanconway <https://github.com/alanconway> is there work here to be done on the collector side to resolve this or is this purely related to the runtime work you started?

It's mostly the backpressure work Sergey is doing. There may be something to tweak on the collector - e.g. enable Fluentd's blocking mode. It should just be a matter of configuring the collector correctly, I think all the collectors we use now or would consider in future will have an at-least-once delivery mode.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1618?email_source=notifications&email_token=AB3LUXSOB6PJRU47CRPXETTQ3X4SLA5CNFSM4HJD4VKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH6PZLQ#issuecomment-570227886>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3LUXU3NDYPDYHZXDVHYATQ3X4SLANCNFSM4HJD4VKA> .

Answer 6 · 2020-01-06T20:02:37.000Z

This problem would likely be solved by a solution like the one proposed for conmon [1].

[1] containers/conmon#84

Answer 7 · 2020-01-06T22:12:08.000Z

Closing issue to be resolved by impl of containers/conmon#84