LivenessProb failed

Question

LivenessProb failed

Closed this issue 4 years ago · 4 comments

I have filtered fluentd logs to exclude stdout and sending only stderr.. I am using the fluentd-elasticsearch chart version 6.1.0 and here is my values:

  elasticsearch:
    host: "aws-elasticseach"
    port: 443
    logstashPrefix: "kubelog"
    scheme: "https"
  additionalPlugins:
  - name: fluent-plugin-rewrite-tag-filter
    version: 2.1.1
  livenessProbe:
    enabled: true
    initialDelaySeconds: 600
    periodSeconds: 60
    kind:
      exec:
        command:
        - '/bin/sh'
        - '-c'
        - >
          LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
          STUCK_THRESHOLD_SECONDS=${STUCK_THRESHOLD_SECONDS:-900};
          if [ ! -e /var/log/fluentd-buffers ];
          then
            echo "first"
            exit 1;
          fi;
          touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
          if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit)" ];
          then
            rm -rf /var/log/fluentd-buffers;
            echo "second: STUCK_THRESHOLD_SECONDS=$STUCK_THRESHOLD_SECONDS"
            exit 1;
          fi;
          touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
          if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-liveness -print -quit)" ];
          then
            echo "third: LIVENESS_THRESHOLD_SECONDS=$LIVENESS_THRESHOLD_SECONDS"
            exit 1;
          fi;
          
  configMaps:
    useDefaults:
      containersInputConf: false
      outputConf: false

  extraConfigMaps:
    containers.input.conf: |-
      <source>
        @id fluentd-containers.log
        @type tail
        path /var/log/containers/*.log
        pos_file /var/log/containers.log.pos
        tag raw.kubernetes.*
        read_from_head true
        <parse>
          @type multi_format
          <pattern>
            format json
            time_key time
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
            time_format %Y-%m-%dT%H:%M:%S.%N%:z
          </pattern>
        </parse>
      </source>
      
      # Detect exceptions in the log output and forward them as one log entry.
      <match raw.kubernetes.**>
        @id raw.kubernetes
        @type detect_exceptions
        remove_tag_prefix raw
        message log
        stream stream
        multiline_flush_interval 5
        max_bytes 500000
        max_lines 1000
      </match>

      # Concatenate multi-line logs
      <filter .**>
        @id filter_concat
        @type concat
        key message
        multiline_end_regexp /\n$/
        separator ""
        timeout_label @NORMAL
        flush_interval 5
      </filter>

      # Enriches records with Kubernetes metadata
      <filter kubernetes.**>
        @id filter_kubernetes_metadata
        @type kubernetes_metadata
      </filter>
      
      # Exclude stdout and keep stderr from kubernetes log
      <filter kubernetes.**>
        @type grep
        <regexp>
          key $.stream
          pattern /stderr/
        </regexp>
        <exclude>
          key $.stream
          pattern /stdout/
        </exclude>
      </filter>

      # Fixes json fields in Elasticsearch
      <filter kubernetes.**>
        @id filter_parser
        @type parser
        key_name log
        reserve_time true
        reserve_data true
        remove_key_name_field true
        <parse>
          @type multi_format
          <pattern>
            format json
          </pattern>
          <pattern>
            format none
          </pattern>
        </parse>
      </filter>

    output.conf: |-
      <match **>
        @type relabel
        @label @NORMAL
        @type rewrite_tag_filter
        <rule>
          key $.kubernetes.namespace_name
          pattern ^(.+)$
          tag $1.${tag}
        </rule>
      </match>

      <label @NORMAL>
      <match **>
        @id elasticsearch
        @type elasticsearch
        @log_level "#{ENV['OUTPUT_LOG_LEVEL']}"
        include_tag_key true
        host "#{ENV['OUTPUT_HOST']}"
        port "#{ENV['OUTPUT_PORT']}"
        path "#{ENV['OUTPUT_PATH']}"
        scheme "#{ENV['OUTPUT_SCHEME']}"
        ssl_verify "#{ENV['OUTPUT_SSL_VERIFY']}"
        ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
        type_name "#{ENV['OUTPUT_TYPE_NAME']}"
        logstash_format true
        logstash_prefix "#{ENV['LOGSTASH_PREFIX']}"
        reconnect_on_error true
        <buffer>
          @type file
          path /var/log/fluentd-buffers/kubernetes.system.buffer
          flush_mode interval
          retry_type exponential_backoff
          flush_thread_count 2
          flush_interval 5s
          retry_forever
          retry_max_interval 30
          chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
          queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
          overflow_action block
        </buffer>
      </match>
      </label>

The second condition of LivenessProbe always fails but I have no idea why!

Answer 1 · 2020-02-27T10:23:01.000Z

I am experiencing the same problem.
Running on AKS, version 6.1.0

Answer 2 · 2020-02-27T10:33:55.000Z

Is it possible something might be wrong with the probe script?

      # silently hangs for no apparent reasons until manual restart.
      # The idea of this probe is that if fluentd is not queueing or
      # flushing chunks for 5 minutes, something is not right. If
      # you want to change the fluentd configuration, reducing amount of
      # logs fluentd collects, consider changing the threshold or turning
      # liveness probe off completely.
      - '/bin/sh'
      - '-c'
      - >
        LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
        STUCK_THRESHOLD_SECONDS=${STUCK_THRESHOLD_SECONDS:-900};
        if [ ! -e /var/log/fluentd-buffers ];
        then
          exit 1;
        fi;
        touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
        if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit)" ];
        then
          rm -rf /var/log/fluentd-buffers;
          exit 1;
        fi;
        touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
        if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-liveness -print -quit)" ];
        then
          exit 1;
        fi;```

Answer 3 · 2020-03-03T00:42:40.000Z

6.1.1 works for me on GKE, Kubernetes 1.14 and ES 7.6.0.

Answer 4 · 2020-05-02T01:00:30.000Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.