Update fluent.conf to collect pod logs that are uncompressed and rotated by kubelet
Closed this issue · 2 comments
Describe the bug
Kubelet rotates latest log without compression [1], so that container can still write and fluentd can finish reading. However openshift fluent.conf is not using this option as our filter is limited to read "/var/log/pods///*.log"
Environment
- All versions of fluentd fluent.conf
Logs
Pod logs directory has two files available.
# ls /var/log/pods/*/flog-container/*
-rw-------. 1 root root 419M Sep 23 18:43 /var/log/pods/flog_flog-deployment-d97b4b954-ct276_f4639828-4c79-4785-8ef4-55ab9e4002ea/flog-container/0.log
-rw-------. 1 root root 1.1G Sep 23 17:47 /var/log/pods/flog_flog-deployment-d97b4b954-ct276_f4639828-4c79-4785-8ef4-55ab9e4002ea/flog-container/0.log.20230923-174723
However fluentd is only reading 0.log because of the path filter /var/log/pods/*/*/*.log
in fluent.conf
lsof /var/log/pods/*/flog-container/*
fluentd 3223626 root 2128r REG 252,4 438032950 163578597 /var/log/pods/flog_flog-deployment-d97b4b954-ct276_f4639828-4c79-4785-8ef4-55ab9e4002ea/flog-container/0.log
As fluentd is not using the kubelet tuning to read uncompressed files the logs are getting missed during rotation in high volume environement.
Expected behavior
Fluentd should read all pod logs which are uncompressed.
Actual behavior
Fluentd is not reading the logs which are uncompressed.
To Reproduce
Steps to reproduce the behavior:
- Deploy flog container to generate fakelogs
$ cat flog-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: flog-deployment
namespace: flog
spec:
replicas: 1
selector:
matchLabels:
app: flog
template:
metadata:
labels:
app: flog
spec:
containers:
- name: flog-container
image: quay.io/deployments/flog:latest
env:
- name: OPT
value: "-d1ms -l"
$oc apply -f flog-deploy.yaml
- Wait for the log rotation to happen.
- Fluentd wont read the uncompressed rotated log file by kubelet. It keeps on reading 0.log
Additional context
This problem is annoying on environments that create high volume and keep missing the application logs
Fluent in fact does read these logs because it keeps track of logs by their inode. The primary issue for log loss is that fluent is unable to keep up with the load under certain conditions. This is especially true when there are many containers logging high volume with large messages. I suggest using our vector deployment which has better throughput in general