LivenessProb failed
Closed this issue · 4 comments
kiyanabah commented
I have filtered fluentd logs to exclude stdout and sending only stderr.. I am using the fluentd-elasticsearch chart version 6.1.0 and here is my values:
elasticsearch:
host: "aws-elasticseach"
port: 443
logstashPrefix: "kubelog"
scheme: "https"
additionalPlugins:
- name: fluent-plugin-rewrite-tag-filter
version: 2.1.1
livenessProbe:
enabled: true
initialDelaySeconds: 600
periodSeconds: 60
kind:
exec:
command:
- '/bin/sh'
- '-c'
- >
LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
STUCK_THRESHOLD_SECONDS=${STUCK_THRESHOLD_SECONDS:-900};
if [ ! -e /var/log/fluentd-buffers ];
then
echo "first"
exit 1;
fi;
touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit)" ];
then
rm -rf /var/log/fluentd-buffers;
echo "second: STUCK_THRESHOLD_SECONDS=$STUCK_THRESHOLD_SECONDS"
exit 1;
fi;
touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-liveness -print -quit)" ];
then
echo "third: LIVENESS_THRESHOLD_SECONDS=$LIVENESS_THRESHOLD_SECONDS"
exit 1;
fi;
configMaps:
useDefaults:
containersInputConf: false
outputConf: false
extraConfigMaps:
containers.input.conf: |-
<source>
@id fluentd-containers.log
@type tail
path /var/log/containers/*.log
pos_file /var/log/containers.log.pos
tag raw.kubernetes.*
read_from_head true
<parse>
@type multi_format
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
</source>
# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
@id raw.kubernetes
@type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
# Concatenate multi-line logs
<filter .**>
@id filter_concat
@type concat
key message
multiline_end_regexp /\n$/
separator ""
timeout_label @NORMAL
flush_interval 5
</filter>
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
@id filter_kubernetes_metadata
@type kubernetes_metadata
</filter>
# Exclude stdout and keep stderr from kubernetes log
<filter kubernetes.**>
@type grep
<regexp>
key $.stream
pattern /stderr/
</regexp>
<exclude>
key $.stream
pattern /stdout/
</exclude>
</filter>
# Fixes json fields in Elasticsearch
<filter kubernetes.**>
@id filter_parser
@type parser
key_name log
reserve_time true
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format json
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
output.conf: |-
<match **>
@type relabel
@label @NORMAL
@type rewrite_tag_filter
<rule>
key $.kubernetes.namespace_name
pattern ^(.+)$
tag $1.${tag}
</rule>
</match>
<label @NORMAL>
<match **>
@id elasticsearch
@type elasticsearch
@log_level "#{ENV['OUTPUT_LOG_LEVEL']}"
include_tag_key true
host "#{ENV['OUTPUT_HOST']}"
port "#{ENV['OUTPUT_PORT']}"
path "#{ENV['OUTPUT_PATH']}"
scheme "#{ENV['OUTPUT_SCHEME']}"
ssl_verify "#{ENV['OUTPUT_SSL_VERIFY']}"
ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
type_name "#{ENV['OUTPUT_TYPE_NAME']}"
logstash_format true
logstash_prefix "#{ENV['LOGSTASH_PREFIX']}"
reconnect_on_error true
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
overflow_action block
</buffer>
</match>
</label>
The second condition of LivenessProbe always fails but I have no idea why!
kiwiidb commented
I am experiencing the same problem.
Running on AKS, version 6.1.0
kiwiidb commented
Is it possible something might be wrong with the probe script?
# silently hangs for no apparent reasons until manual restart.
# The idea of this probe is that if fluentd is not queueing or
# flushing chunks for 5 minutes, something is not right. If
# you want to change the fluentd configuration, reducing amount of
# logs fluentd collects, consider changing the threshold or turning
# liveness probe off completely.
- '/bin/sh'
- '-c'
- >
LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
STUCK_THRESHOLD_SECONDS=${STUCK_THRESHOLD_SECONDS:-900};
if [ ! -e /var/log/fluentd-buffers ];
then
exit 1;
fi;
touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit)" ];
then
rm -rf /var/log/fluentd-buffers;
exit 1;
fi;
touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-liveness -print -quit)" ];
then
exit 1;
fi;```
monotek commented
6.1.1 works for me on GKE, Kubernetes 1.14 and ES 7.6.0.
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.