openshift/origin-aggregated-logging

Issue deploying logging in 3.11

Closed this issue · 9 comments

Deploying logging in 3.11 cluster fails with following error.

The full traceback is:
WARNING: The below traceback may *not* be related to the actual failure.
  File "/tmp/ansible_command_payload_svseXs/ansible_command_payload.zip/ansible/module_utils/basic.py", line 2561, in run_command
    cmd = subprocess.Popen(args, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception

fatal: [master01.domain -> localhost]: FAILED! => {
    "changed": false,
    "cmd": "patch --force --quiet -u /tmp/openshift-logging-ansible-yb1r0N/configmap_new_file /tmp/openshift-logging-ansible-yb1r0N/patch.patch",
    "invocation": {
        "module_args": {
            "_raw_params": "patch --force --quiet -u /tmp/openshift-logging-ansible-yb1r0N/configmap_new_file /tmp/openshift-logging-ansible-yb1r0N/patch.patch",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "msg": "[Errno 2] No such file or directory",
    "rc": 2
}

Using latest openshift-ansible playbooks
openshift-ansible-3.11.343-1

Inventory

openshift_logging_image_version=v3.11.0
openshift_logging_use_ops=true
openshift_logging_install_logging=true
openshift_logging_master_url=https://cluster11.domain:8443
openshift_logging_install_eventrouter=true
openshift_logging_eventrouter_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_curator_default_days=15
openshift_logging_curator_run_hour=23
openshift_logging_curator_run_minute=00
openshift_logging_curator_run_timezone=America/NewYork
openshift_logging_curator_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_es_memory_limit=32Gi
openshift_logging_es_ops_memory_limit=16Gi
openshift_logging_kibana_hostname=logging.prod11.domain
openshift_logging_fluentd_audit_container_engine=true
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_storage_class_name=glusterfs
openshift_logging_es_pvc_size=250Gi
openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_prefix=logging-es
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_es_ops_pvc_dynamic=true
openshift_logging_es_ops_pvc_storage_class_name=glusterfs
openshift_logging_es_ops_pvc_size=250Gi
openshift_logging_es_ops_pvc_prefix=logging-ops-es
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/infra":"true"}
openshift_logging_es_number_of_replicas=1
openshift_logging_fluentd_image=quay.io/openshift/origin-logging-fluentd:v3.11.0
openshift_logging_kibana_image=quay.io/openshift/origin-loging-kibana5:v3.11.0
openshift_logging_curator_image=quay.io/openshift/origin-logging-curator:v3.11.0
openshift_logging_eventrouter_image=quay.io/openshift/origin-logging-eventrouter:v3.11.0
openshift_logging_elasticsearch_image=quay.io/openshift/origin-logging-elasticsearch5:v3.11.0
openshift_logging_es_cluster_size=3

Can you please attach logs with -vvv enabled

Jeff, exactly which log(s) are you referring? Ansible?

Rerun the playbook at enable more verbose logging with -vvv and attach the outcome.

Rerun the playbook at enable more verbose logging with -vvv and attach the outcome.

Jeff, log was uploaded. Also I noted a typo in the inventory for logging-kibana image, and I reverted back to a previous ansible playbook. This deployed ok. A little adjustment got the logging-es-data-master, and logging-kibana running - es-ops-data and ops-kibana crash. Fluentd is up and running on all nodes, no issues. While I see the storage being used the kibana dashboard returns an empty result. Appears to be " temporarily failed to flush the buffer." as I see in the fluent logs on the nodes.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.