kiwigrid/helm-charts

LivenessProb failed

Closed this issue · 4 comments

I have filtered fluentd logs to exclude stdout and sending only stderr.. I am using the fluentd-elasticsearch chart version 6.1.0 and here is my values:

  elasticsearch:
    host: "aws-elasticseach"
    port: 443
    logstashPrefix: "kubelog"
    scheme: "https"
  additionalPlugins:
  - name: fluent-plugin-rewrite-tag-filter
    version: 2.1.1
  livenessProbe:
    enabled: true
    initialDelaySeconds: 600
    periodSeconds: 60
    kind:
      exec:
        command:
        - '/bin/sh'
        - '-c'
        - >
          LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
          STUCK_THRESHOLD_SECONDS=${STUCK_THRESHOLD_SECONDS:-900};
          if [ ! -e /var/log/fluentd-buffers ];
          then
            echo "first"
            exit 1;
          fi;
          touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
          if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit)" ];
          then
            rm -rf /var/log/fluentd-buffers;
            echo "second: STUCK_THRESHOLD_SECONDS=$STUCK_THRESHOLD_SECONDS"
            exit 1;
          fi;
          touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
          if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-liveness -print -quit)" ];
          then
            echo "third: LIVENESS_THRESHOLD_SECONDS=$LIVENESS_THRESHOLD_SECONDS"
            exit 1;
          fi;
          
  configMaps:
    useDefaults:
      containersInputConf: false
      outputConf: false

  extraConfigMaps:
    containers.input.conf: |-
      <source>
        @id fluentd-containers.log
        @type tail
        path /var/log/containers/*.log
        pos_file /var/log/containers.log.pos
        tag raw.kubernetes.*
        read_from_head true
        <parse>
          @type multi_format
          <pattern>
            format json
            time_key time
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
            time_format %Y-%m-%dT%H:%M:%S.%N%:z
          </pattern>
        </parse>
      </source>
      
      # Detect exceptions in the log output and forward them as one log entry.
      <match raw.kubernetes.**>
        @id raw.kubernetes
        @type detect_exceptions
        remove_tag_prefix raw
        message log
        stream stream
        multiline_flush_interval 5
        max_bytes 500000
        max_lines 1000
      </match>

      # Concatenate multi-line logs
      <filter .**>
        @id filter_concat
        @type concat
        key message
        multiline_end_regexp /\n$/
        separator ""
        timeout_label @NORMAL
        flush_interval 5
      </filter>

      # Enriches records with Kubernetes metadata
      <filter kubernetes.**>
        @id filter_kubernetes_metadata
        @type kubernetes_metadata
      </filter>
      
      # Exclude stdout and keep stderr from kubernetes log
      <filter kubernetes.**>
        @type grep
        <regexp>
          key $.stream
          pattern /stderr/
        </regexp>
        <exclude>
          key $.stream
          pattern /stdout/
        </exclude>
      </filter>

      # Fixes json fields in Elasticsearch
      <filter kubernetes.**>
        @id filter_parser
        @type parser
        key_name log
        reserve_time true
        reserve_data true
        remove_key_name_field true
        <parse>
          @type multi_format
          <pattern>
            format json
          </pattern>
          <pattern>
            format none
          </pattern>
        </parse>
      </filter>

    output.conf: |-
      <match **>
        @type relabel
        @label @NORMAL
        @type rewrite_tag_filter
        <rule>
          key $.kubernetes.namespace_name
          pattern ^(.+)$
          tag $1.${tag}
        </rule>
      </match>

      <label @NORMAL>
      <match **>
        @id elasticsearch
        @type elasticsearch
        @log_level "#{ENV['OUTPUT_LOG_LEVEL']}"
        include_tag_key true
        host "#{ENV['OUTPUT_HOST']}"
        port "#{ENV['OUTPUT_PORT']}"
        path "#{ENV['OUTPUT_PATH']}"
        scheme "#{ENV['OUTPUT_SCHEME']}"
        ssl_verify "#{ENV['OUTPUT_SSL_VERIFY']}"
        ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
        type_name "#{ENV['OUTPUT_TYPE_NAME']}"
        logstash_format true
        logstash_prefix "#{ENV['LOGSTASH_PREFIX']}"
        reconnect_on_error true
        <buffer>
          @type file
          path /var/log/fluentd-buffers/kubernetes.system.buffer
          flush_mode interval
          retry_type exponential_backoff
          flush_thread_count 2
          flush_interval 5s
          retry_forever
          retry_max_interval 30
          chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
          queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
          overflow_action block
        </buffer>
      </match>
      </label>

The second condition of LivenessProbe always fails but I have no idea why!

I am experiencing the same problem.
Running on AKS, version 6.1.0

Is it possible something might be wrong with the probe script?

      # silently hangs for no apparent reasons until manual restart.
      # The idea of this probe is that if fluentd is not queueing or
      # flushing chunks for 5 minutes, something is not right. If
      # you want to change the fluentd configuration, reducing amount of
      # logs fluentd collects, consider changing the threshold or turning
      # liveness probe off completely.
      - '/bin/sh'
      - '-c'
      - >
        LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
        STUCK_THRESHOLD_SECONDS=${STUCK_THRESHOLD_SECONDS:-900};
        if [ ! -e /var/log/fluentd-buffers ];
        then
          exit 1;
        fi;
        touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
        if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit)" ];
        then
          rm -rf /var/log/fluentd-buffers;
          exit 1;
        fi;
        touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
        if [ -z "$(find /var/log/fluentd-buffers -type d -newer /tmp/marker-liveness -print -quit)" ];
        then
          exit 1;
        fi;```

6.1.1 works for me on GKE, Kubernetes 1.14 and ES 7.6.0.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.