openshift/origin-aggregated-logging

logging-fluentd Crashloopbackoff

Closed this issue · 4 comments

Following all the previous issues regarding same and following the recommendations. I still can't get it to work. Across all the nodes the logging-fluentd is in Crashloopbackoff state

oc v3.11.0+62803d0-1
kubernetes v1.11.0+d4cacc0
openshift-ansible-3.11.317-1-8-g1113fc1
ansible 2.8.2
quay.io/openshift/origin-logging-fluentd   v3.11  3643ce22b369 
Image ID:       `docker-pullable://quay.io/openshift/origin-logging-fluentd@sha256:4fab9833ca57bf50648fcdc3f17453c92758600253df0f031e7460ea8fefacc0
2020-11-10 10:10:29 -0500 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
2020-11-10 10:10:29 -0500 [error]: /usr/share/ruby/net/http.rb:878:in `open'
2020-11-10 10:10:29 -0500 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
2020-11-10 10:10:29 -0500 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
2020-11-10 10:10:29 -0500 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
2020-11-10 10:10:29 -0500 [error]: /usr/share/ruby/net/http.rb:862:in `do_start'
2020-11-10 10:10:29 -0500 [error]: /usr/share/ruby/net/http.rb:851:in `start'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/rest-client-2.1.0/lib/restclient/request.rb:727:in `transmit'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/rest-client-2.1.0/lib/restclient/resource.rb:51:in `get'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluent-plugin-kubernetes_metadata_filter-1.2.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:225:in `configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/agent.rb:145:in `add_filter'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/agent.rb:62:in `block in configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/agent.rb:57:in `each'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/agent.rb:57:in `configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/root_agent.rb:83:in `block in configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/root_agent.rb:83:in `each'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/root_agent.rb:83:in `configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/engine.rb:129:in `configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/engine.rb:103:in `run_configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:498:in `run_configure'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:183:in `block in start'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:375:in `call'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:375:in `main_process'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:179:in `start'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
2020-11-10 10:10:29 -0500 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
2020-11-10 10:10:29 -0500 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/gems/fluentd-0.12.43/bin/fluentd:8:in `<top (required)>'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/bin/fluentd:23:in `load'
2020-11-10 10:10:29 -0500 [error]: /opt/app-root/src/bin/fluentd:23:in `<main>

Consider providing a snapshot of the environment as there is nothing to be gleaned from the provide stack trace: https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/hack/logging-dump.sh

Consider providing a snapshot of the environment as there is nothing to be gleaned from the provide stack trace: https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/hack/logging-dump.sh

Hi Jeff, you had a chance to look at this?

Consider providing a snapshot of the environment as there is nothing to be gleaned from the provide stack trace: https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/hack/logging-dump.sh

Hi Jeff, you had a chance to look at this?

Your logging system is grossly undersized. You have 20+ fluent pods pounding a single node ES instance configured with 1G of heap -Xms512m -Xmx512m For 3.11, we never recommend less then 16G per ES node with a minimum of 3 nodes, eache with as much as you can spare up to 64G. Elasticsearch is a resource intensive application that works well with high memory and CPU, high IO disks. I would suggest the stack you see from the collector is request timeout because the ES node is unable to respond.