Objects delivered in the wrong path in S3
max-blue opened this issue · 5 comments
Describe the question/issue
I converted FluentD with Fluent-bit to ship logs from K8S to S3. The tag_rewrite config I have is not working as expected and pushing logs to the incorrect path in S3. Logs are expected to push in the following path:
2023/10/14/namespace/container_name/container-name-namespace_name-2023-10-14-UUID.txt
but it gets pushed in to the following path:
2023/10/14/var/log/containers/containers-var-20231014-0759-.log-object00N1PX3n
Configuration
fluentbitS3:
enabled: true
values:
kind: DaemonSet
image:
repository: cr.fluentbit.io/fluent/fluent-bit
pullPolicy: Always
env:
- name: CLUSTER
value: company-cluster
testFramework:
enabled: true
image:
repository: busybox
pullPolicy: Always
tag: latest
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::1234567890:role/company-cluster-fluentbit
rbac:
create: true
nodeAccess: true
hostNetwork: false
dnsPolicy: ClusterFirst
service:
type: ClusterIP
port: 2022
livenessProbe:
httpGet:
path: /
port: http
readinessProbe:
httpGet:
path: /api/v1/health
port: http
flush: 1
metricsPort: 2022
config:
service: |
[SERVICE]
Daemon Off
Flush {{ .Values.flush }}
Log_Level {{ .Values.logLevel }}
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port {{ .Values.metricsPort }}
Health_Check On
inputs: |
[INPUT]
Name tail
Tag s3logs.*
Path /var/log/containers/*.log
parser cri
multiline.parser cri
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Skip_Empty_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match s3logs.*
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Keep_Log Off
Labels Off
Annotations Off
[FILTER]
Name record_modifier
Match s3logs.*
Record cluster_name ${CLUSTER}
[FILTER]
name lua
alias set_std_keys
match s3logs.*
script /fluent-bit/scripts/s3_path.lua
call set_std_keys
[FILTER]
name rewrite_tag
match s3logs.*
rule $log ^.*$ s3.$namespace_name.$app_name.$container_name.$pod_id true
outputs: |
[OUTPUT]
Name s3
Match s3logs.*
bucket logs.company-cluster.us-east-1.company.com
region us-east-1
s3_key_format /%Y/%m/%d/$TAG[1]/$TAG[2]/$TAG[3]/$TAG[3]-$TAG[1]-%Y%m%d-%H%M-${podid}-%{index}.%{file_extension}
store_dir /var/log/fluentbit-s3-buffers
total_file_size 256MB
upload_timeout 2m
use_put_object On
compression gzip
preserve_data_ordering On
volumeMounts:
- name: config
mountPath: /fluent-bit/etc/fluent-bit.conf
subPath: fluent-bit.conf
- name: script
mountPath: /fluent-bit/etc/s3_path.lua
subPath: s3_path.lua
- name: buffers
mountPath: /var/log/fluentbit-s3-buffers
daemonSetVolumes:
- name: varlog
hostPath:
path: /var/log
- name: buffers
emptyDir: {}
daemonSetVolumeMounts:
- name: varlog
mountPath: /var/log
logLevel: info
luaScripts:
s3_path.lua: |
function set_std_keys(tag, timestamp, record)
-- Pull up cluster
if (record["cluster_name"] ~= nil) then
record["cluster_name"] = record["cluster_name"]
else
record["cluster_name"] = "company-cluster"
end
if (record["kubernetes"] ~= nil) then
kube = record["kubernetes"]
-- Pull up namespace
if (kube["namespace_name"] ~= nil and string.len(kube["namespace_name"]) > 0) then
record["namespace_name"] = kube["namespace_name"]
else
record["namespace_name"] = "default"
end
-- Pull up container name
if (kube["container_name"] ~= nil and string.len(kube["container_name"]) > 0) then
record["container_name"] = kube["container_name"]
end
-- Pull up pod id
if (kube["pod_id"] ~= nil and string.len(kube["pod_id"]) > 0) then
record["pod_id"] = kube["pod_id"]
end
-- Pull up app name (Deployment, StateFuleSets, DaemonSet, Job, CronJob etc)
if (kube["labels"] ~= nil) then
labels = kube["labels"]
if (labels["app"] ~= nil and string.len(labels["app"]) > 0) then
record["app_name"] = labels["app"]
elseif (labels["app.kubernetes.io/instance"] ~= nil and string.len(labels["app.kubernetes.io/instance"]) > 0) then
record["app_name"] = labels["app.kubernetes.io/instance"]
elseif (labels["k8s-app"] ~= nil and string.len(labels["k8s-app"]) > 0) then
record["app_name"] = labels["k8s-app"]
elseif (labels["name"] ~= nil and string.len(labels["name"]) > 0) then
record["app_name"] = labels["name"]
end
else
record["app_name"] = record["app_name"]
end
end
return 2, timestamp, record
end
Fluent Bit Log Output
[2023/10/18 07:39:14] [debug] [input chunk] update output instances with new chunk size diff=346, records=1, input=emitter_for_rewrite_tag.3 │
│ [2023/10/18 07:39:14] [debug] [input chunk] update output instances with new chunk size diff=346, records=1, input=tail.0 │
│ [2023/10/18 07:39:14] [debug] [input:tail:tail.0] inode=203429971, /var/log/containers/jumeirah-d647cd446-pm9z9_production_jumeirah-eb74b3c1f3a47ec7d5c922065122784847a5f31060917ae22633bdfca68907bd.log, events: IN │
│ _MODIFY │
│ [2023/10/18 07:39:14] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information │
│ [2023/10/18 07:39:15] [debug] [http_client] not using http_proxy for header │
│ [2023/10/18 07:39:15] [debug] [http_client] server kubernetes.default.svc:443 will close connection #70 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] Request (ns=production, pod=s.jumeirah-d647cd446-pm9z9) http_do=0, HTTP Status: 404 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] HTTP response │
│ {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"s.jumeirah-d647cd446-pm9z9\" not found","reason":"NotFound","details":{"name":"s.jumeirah-d647cd446-pm9z9","kind":"pods"},"code":4 │
│ 04} │
│ │
│ [2023/10/18 07:39:15] [debug] [input chunk] update output instances with new chunk size diff=300, records=1, input=emitter_for_rewrite_tag.3 │
│ [2023/10/18 07:39:15] [debug] [input chunk] update output instances with new chunk size diff=329, records=1, input=emitter_for_rewrite_tag.3 │
│ [2023/10/18 07:39:15] [debug] [input chunk] update output instances with new chunk size diff=334, records=1, input=emitter_for_rewrite_tag.3 │
│ [2023/10/18 07:39:15] [debug] [input chunk] update output instances with new chunk size diff=963, records=3, input=tail.0 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information │
│ [2023/10/18 07:39:15] [debug] [http_client] not using http_proxy for header │
│ [2023/10/18 07:39:15] [debug] [http_client] server kubernetes.default.svc:443 will close connection #70 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] Request (ns=production, pod=s.jumeirah-d647cd446-cvkvn) http_do=0, HTTP Status: 404 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] HTTP response │
│ {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"s.jumeirah-d647cd446-cvkvn\" not found","reason":"NotFound","details":{"name":"s.jumeirah-d647cd446-cvkvn","kind":"pods"},"code":4 │
│ 04} │
│ │
│ [2023/10/18 07:39:15] [debug] [input chunk] update output instances with new chunk size diff=365, records=1, input=emitter_for_rewrite_tag.3 │
│ [2023/10/18 07:39:15] [debug] [input chunk] update output instances with new chunk size diff=365, records=1, input=tail.0 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information │
│ [2023/10/18 07:39:15] [debug] [http_client] not using http_proxy for header │
│ [2023/10/18 07:39:15] [debug] [http_client] server kubernetes.default.svc:443 will close connection #70 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] Request (ns=production, pod=s.jumeirah-d647cd446-k424z) http_do=0, HTTP Status: 404 │
│ [2023/10/18 07:39:15] [debug] [filter:kubernetes:kubernetes.0] HTTP response │
│ {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"s.jumeirah-d647cd446-k424z\" not found","reason":"NotFound","details":{"name":"s.jumeirah-d647cd446-k424z","kind":"pods"},"code":4 │
│ 04}
Fluent Bit Version Info
Which AWS for Fluent Bit Versions have you tried? latest stable
Which versions have you seen the issue in? Are there any versions where you do not see the issue? All versions
Cluster Details
- what is the networking setup? Cluster uses AWS CNI provided by AWS EKS
- do you use App Mesh or a service mesh? NO
- does you use VPC endpoints in a network restricted VPC? NO
- EKS
*EC2
*Daemon for Fluent Bit
Application Details
Unknown
Steps to reproduce issue
Use the configuration above and the logs go in the wrong path.
The goal of this is to match all log messages right?
$log ^.*$
Try this instead:
$log ^[\S]+$
I found this while working on this project, that was the regex that worked in FLB to match all logs- IIRC I tried the same regex you have and it didn't work, I am not sure why: https://github.com/aws/aws-for-fluent-bit/pull/499/files#diff-1413562a024b7a0a612040a520fe770ac13e9d3fcc799d78bd48a808a6230905R23
I'll add a debugging guide entry for this.
I'll also update this tutorial as well: https://github.com/aws/aws-for-fluent-bit/tree/dev/use_cases/k8s-metadata-customize-tag
I am still not able to send logs to the correct path in s3. I have tried updated the config several different ways. Below is my latest config.
apiVersion: v1
data:
custom_parsers.conf: |
[PARSER]
Name docker_no_time
Format json
Time_Keep Off
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
fluent-bit.conf: |
[SERVICE]
Daemon Off
Flush 1
Log_Level debug
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2022
Health_Check On
[INPUT]
Name tail
Tag s3logs.*
Path /var/log/containers/*.log
parser cri
multiline.parser cri
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Skip_Empty_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match s3logs.*
Kube_Tag_Prefix s3logs.var.log.containers.
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Keep_Log Off
Labels Off
Annotations Off
[FILTER]
Name rewrite_tag
Match s3logs.*
Rule $kubernetes['namespace_name'] ^[a-zA-Z0-9-_]*$ $kubernetes['namespace_name'].$kubernetes['container_name'].$kubernetes['pod_id'] false
[FILTER]
Name record_modifier
Match s3logs.*
Record cluster_name ${CLUSTER}
[OUTPUT]
Name s3
Match s3logs.*
bucket logs.company.us-east-1.domain.com
region us-east-1
s3_key_format /%Y/%m/%d/$TAG[1]/$TAG[2]/$TAG[2]-$TAG[1]-%Y%m%d-$TAG[3].txt
store_dir /var/log/fluentbit-s3-buffers
total_file_size 256MB
upload_timeout 2m
use_put_object On
compression gzip
preserve_data_ordering On
Did you try my suggestion here? => #748 (comment)
Also, your input sets this tag:
[INPUT]
Name tail
Tag s3logs.*
But then the rewrite_tag rule will change the tag to start with $kubernetes['namespace_name']
But then your S3 match pattern is:
Match s3logs.*
So your S3 output only matches logs which did not have their tag rewritten by the rewrite_tag filter I think .