Issue deploying logging in 3.11

Question

Issue deploying logging in 3.11

Closed this issue 3 years ago · 9 comments

Deploying logging in 3.11 cluster fails with following error.

The full traceback is:
WARNING: The below traceback may *not* be related to the actual failure.
  File "/tmp/ansible_command_payload_svseXs/ansible_command_payload.zip/ansible/module_utils/basic.py", line 2561, in run_command
    cmd = subprocess.Popen(args, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception

fatal: [master01.domain -> localhost]: FAILED! => {
    "changed": false,
    "cmd": "patch --force --quiet -u /tmp/openshift-logging-ansible-yb1r0N/configmap_new_file /tmp/openshift-logging-ansible-yb1r0N/patch.patch",
    "invocation": {
        "module_args": {
            "_raw_params": "patch --force --quiet -u /tmp/openshift-logging-ansible-yb1r0N/configmap_new_file /tmp/openshift-logging-ansible-yb1r0N/patch.patch",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "msg": "[Errno 2] No such file or directory",
    "rc": 2
}

Using latest openshift-ansible playbooks
openshift-ansible-3.11.343-1

Inventory

openshift_logging_image_version=v3.11.0
openshift_logging_use_ops=true
openshift_logging_install_logging=true
openshift_logging_master_url=https://cluster11.domain:8443
openshift_logging_install_eventrouter=true
openshift_logging_eventrouter_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_curator_default_days=15
openshift_logging_curator_run_hour=23
openshift_logging_curator_run_minute=00
openshift_logging_curator_run_timezone=America/NewYork
openshift_logging_curator_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_es_memory_limit=32Gi
openshift_logging_es_ops_memory_limit=16Gi
openshift_logging_kibana_hostname=logging.prod11.domain
openshift_logging_fluentd_audit_container_engine=true
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_storage_class_name=glusterfs
openshift_logging_es_pvc_size=250Gi
openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_prefix=logging-es
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_es_ops_pvc_dynamic=true
openshift_logging_es_ops_pvc_storage_class_name=glusterfs
openshift_logging_es_ops_pvc_size=250Gi
openshift_logging_es_ops_pvc_prefix=logging-ops-es
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_es_number_of_replicas=1
openshift_logging_fluentd_image=quay.io/openshift/origin-logging-fluentd:v3.11.0
openshift_logging_kibana_image=quay.io/openshift/origin-loging-kibana5:v3.11.0
openshift_logging_curator_image=quay.io/openshift/origin-logging-curator:v3.11.0
openshift_logging_eventrouter_image=quay.io/openshift/origin-logging-eventrouter:v3.11.0
openshift_logging_elasticsearch_image=quay.io/openshift/origin-logging-elasticsearch5:v3.11.0
openshift_logging_es_cluster_size=3

Answer 1 · 2021-03-10T15:51:50.000Z

Can you please attach logs with -vvv enabled

Answer 2 · 2021-03-10T15:57:52.000Z

Jeff, exactly which log(s) are you referring? Ansible?

Answer 3 · 2021-03-10T16:04:36.000Z

Rerun the playbook at enable more verbose logging with -vvv and attach the outcome.

Answer 4 · 2021-03-10T16:17:18.000Z

ansible.log

Answer 5 · 2021-03-15T13:05:16.000Z

Rerun the playbook at enable more verbose logging with -vvv and attach the outcome.

Jeff, log was uploaded. Also I noted a typo in the inventory for logging-kibana image, and I reverted back to a previous ansible playbook. This deployed ok. A little adjustment got the logging-es-data-master, and logging-kibana running - es-ops-data and ops-kibana crash. Fluentd is up and running on all nodes, no issues. While I see the storage being used the kibana dashboard returns an empty result. Appears to be " temporarily failed to flush the buffer." as I see in the fluent logs on the nodes.

Answer 6 · 2021-06-13T13:50:04.000Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Answer 7 · 2021-07-13T17:30:33.000Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Answer 8 · 2021-08-12T20:20:35.000Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Answer 9 · 2021-08-12T20:30:08.000Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.