openshift/ansible-service-broker

automation-broker-apb canary image hangs running tests sometimes

jmontleon opened this issue · 13 comments

CI hangs during canary tests maybe 25% of the time. gdb/strace output below.

I don't see anything glaringly obvious, unfortunately.

I do see some other similar looking issues, but none of these are an exact match.
https://bugzilla.redhat.com/show_bug.cgi?id=1579432
https://groups.google.com/forum/#!topic/grpc-io/uI5vOtacknA
https://bugs.python.org/issue28854

Some of these have been fixed by updated packages in CentOS/RHEL. In particular it looks like python is a bit out of date since the release of 7.6 so I'm going to start buy updating apb-base canary and then updating automation-broker-apb canary. If this doesn't solve the issue I may build an apb-base with additional debuginfo and debug tools so we can try to get to the bottom of what's happening.

(gdb) bt
#0 0x00007f29349b9afb in do_usr/bin/strace -f -p 2479
usr/bin/strace: Process 2479 attached

futex(0x37d1320, FUTEX_WAIT_PRIVATE, 0, NULLfutex_wait.constprop.1 () from /lib64/libpthread.so.0
#1 0x00007f29349b9b8f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007f29349b9c2b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3 0x00007f2934cd9735 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#4 0x00007f2934cdd422 in lock_PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#5 0x00007f2934caccf0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#7 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#9 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x00007f2934cac6bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x00007f2934c38978 in function_call () from /lib64/libpython2.7.so.1.0
#13 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#14 0x00007f2934c22a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#15 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x00007f2934ca58f7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#17 0x00007f2934c6a98c in slot_tp_del () from /lib64/libpython2.7.so.1.0
#18 0x00007f2934c659ca in subtype_dealloc () from /lib64/libpython2.7.so.1.0
#19 0x00007f2934c491af in dict_dealloc () from /lib64/libpython2.7.so.1.0
#20 0x00007f2934c654b2 in subtype_clear () from /lib64/libpython2.7.so.1.0
#21 0x00007f2934cdc93a in collect () from /lib64/libpython2.7.so.1.0
#22 0x00007f2934cdd267 in _PyObject_GC_Malloc () from /lib64/libpython2.7.so.1.0
#23 0x00007f2934cdd2ad in _PyObject_GC_New () from /lib64/libpython2.7.so.1.0
#24 0x00007f2934c23a9d in PyMethod_New () from /lib64/libpython2.7.so.1.0
#25 0x00007f2934caa1f0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#26 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#27 0x00007f2934c38a6d in function_call () from /lib64/libpython2.7.so.1.0
#28 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#29 0x00007f2934c22a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#30 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#31 0x00007f2934c6aa87 in slot_tp_init () from /lib64/libpython2.7.so.1.0
#32 0x00007f2934c6979f in type_call () from /lib64/libpython2.7.so.1.0
#33 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#34 0x00007f2934ca76fd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#35 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#36 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#37 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#38 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#39 0x00007f2934cac6bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#40 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#41 0x00007f2934c38978 in function_call () from /lib64/libpython2.7.so.1.0
#42 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#43 0x00007f2934c13b45 in call_function_tail () from /lib64/libpython2.7.so.1.0
#44 0x00007f2934c13e7b in PyObject_CallMethod () from /lib64/libpython2.7.so.1.0
#45 0x00007f2934ca51bd in PyEval_ReInitThreads () from /lib64/libpython2.7.so.1.0
#46 0x00007f2934cdf9f9 in PyOS_AfterFork () from /lib64/libpython2.7.so.1.0
#47 0x00007f2934ce47d5 in posix_fork () from /lib64/libpython2.7.so.1.0
---Type to continue, or q to quit---
#48 0x00007f2934caca7c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#49 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#50 0x00007f2934c38978 in function_call () from /lib64/libpython2.7.so.1.0
#51 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#52 0x00007f2934c22a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#53 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#54 0x00007f2934c6aa87 in slot_tp_init () from /lib64/libpython2.7.so.1.0
#55 0x00007f2934c6979f in type_call () from /lib64/libpython2.7.so.1.0
#56 0x00007f2934c13a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#57 0x00007f2934ca8236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#58 0x00007f2934cac6bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#59 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#60 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#61 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#62 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#63 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#64 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#65 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#66 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#67 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#68 0x00007f2934cac53c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#69 0x00007f2934caf03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#70 0x00007f2934caf142 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#71 0x00007f2934cc857f in run_mod () from /lib64/libpython2.7.so.1.0
#72 0x00007f2934cc973e in PyRun_FileExFlags () from /lib64/libpython2.7.so.1.0
#73 0x00007f2934cca9c9 in PyRun_SimpleFileExFlags () from /lib64/libpython2.7.so.1.0
#74 0x00007f2934cdbb7f in Py_Main () from /lib64/libpython2.7.so.1.0
#75 0x00007f2933ef8445 in __libc_start_main () from /lib64/libc.so.6
#76 0x000000000040066e in _start ()
(gdb)

Updating base to 7.6 did not resolve the issue. I see the same gdb output. I'm going to add some debug packages to the image so we can troubleshoot further.

Think if I follow this is where it's going bad:
https://github.com/kubernetes-client/python/blob/master/kubernetes/client/api_client.py#L78

But this isn't new in 8.0.0 and it doesn't explain why we see it with just canary. I'm pretty sure nightly is using 8.0.0 as well.

(gdb) py-bt
#4 Waiting for a lock (e.g. GIL)
#5 Waiting for a lock (e.g. GIL)
#7 Frame 0x259c430, for file /usr/lib64/python2.7/threading.py, line 339, in wait (self=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1bea03d890>, _RLock__count=0) at remote 0x7f1be83d1e90>, acquire=<instancemethod at remote 0x7f1be83fa1e0>, _is_owned=<instancemethod at remote 0x7f1be83fa320>, _release_save=<instancemethod at remote 0x7f1be83fa280>, release=<instancemethod at remote 0x7f1be83fa230>, _acquire_restore=<instancemethod at remote 0x7f1be83fa2d0>, _Verbose__verbose=False, _Condition__waiters=[<thread.lock at remote 0x7f1be84d60b0>]) at remote 0x7f1be84ccb50>, timeout=None, balancing=True, waiter=<thread.lock at remote 0x7f1be84d60b0>, saved_state=(1, 139758080321344))
    waiter.acquire()
#11 Frame 0x7f1bea0487f0, for file /usr/lib64/python2.7/threading.py, line 951, in join (self=<Thread(_Thread__ident=139756474193664, _Thread__block=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1bea03d890>, _RLock__count=0) at remote 0x7f1be83d1e90>, acquire=<instancemethod at remote 0x7f1be83fa1e0>, _is_owned=<instancemethod at remote 0x7f1be83fa320>, _release_save=<instancemethod at remote 0x7f1be83fa280>, release=<instancemethod at remote 0x7f1be83fa230>, _acquire_restore=<instancemethod at remote 0x7f1be83fa2d0>, _Verbose__verbose=False, _Condition__waiters=[<thread.lock at remote 0x7f1be84d60b0>]) at remote 0x7f1be84ccb50>, _Thread__name='Thread-21', _Thread__daemonic=True, _state=1, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1be84d6150>, _RLock__c...(truncated)
    self.__block.wait()
#15 Frame 0x7f1be9ff7218, for file /usr/lib64/python2.7/multiprocessing/pool.py, line 457, in join (self=<ThreadPool(_inqueue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x7f1be8541ad0>, maxsize=0, all_tasks_done=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f1be8637c90>, acquire=<built-in method acquire of thread.lock object at remote 0x7f1be8637c90>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f1be8637c90>) at remote 0x7f1be8570510>, mutex=<thread.lock at remote 0x7f1be8637c90>, not_full=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f1be8637c90>, acquire=<built-in method acquire of thread.lock object at remote 0x7f1be8637c90>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f1be8637c90>) at remote 0x7f1be8570290>, not_empty=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f1be8637c90>, acquire=<built-in me...(truncated)
    self._worker_handler.join()
#18 Frame 0x7f1bea98bc20, for file /usr/lib/python2.7/site-packages/kubernetes/client/api_client.py, line 78, in __del__ (self=<ApiClient(default_headers={'User-Agent': 'Swagger-Codegen/8.0.0/python'}, configuration=<Configuration(username='', ssl_ca_cert='/var/run/secrets/kubernetes.io/serviceaccount/ca.crt', cert_file=None, connection_pool_maxsize=40, verify_ssl=True, proxy=None, host='https://172.30.0.1:443', password='', key_file=None, assert_hostname=None, api_key_prefix={}, logger_stream_handler=<StreamHandler(stream=<file at remote 0x7f1bf6b981e0>, level=0, lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1bea03d190>, _RLock__count=0) at remote 0x7f1be9ff3b50>, _name=None, filters=[], formatter=<Formatter(datefmt=None, _fmt='%(asctime)s %(levelname)s %(message)s') at remote 0x7f1be9ff3b10>) at remote 0x7f1be9ff3ad0>, temp_folder_path=None, _Configuration__debug=False, logger_formatter=<...>, _Configuration__logger_file=None, logger={'package_logger': <Logge...(truncated)
    self.pool.join()
#30 Garbage-collecting
#36 Frame 0x7f1bf06b9810, for file /usr/lib64/python2.7/threading.py, line 271, in __init__ (self=<_Condition(_Verbose__verbose=False, _Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1be84d6150>, _RLock__count=0) at remote 0x7f1be83d1ed0>, acquire=<instancemethod at remote 0x7f1be83fa370>, _Condition__waiters=[], release=<instancemethod at remote 0x7f1be83fa3c0>) at remote 0x7f1be84cc9d0>, lock=<...>, verbose=None)
    self._release_save = lock._release_save
#40 Frame 0x31f7950, for file /usr/lib64/python2.7/threading.py, line 569, in _reset_internal_locks (self=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1be84d6150>, _RLock__count=0) at remote 0x7f1be83d1ed0>, acquire=<instancemethod at remote 0x7f1be83fa370>, _Condition__waiters=[], release=<instancemethod at remote 0x7f1be83fa3c0>) at remote 0x7f1be84cc9d0>) at remote 0x7f1be84ccb90>)
    self.__cond.__init__()
#43 Frame 0x3107030, for file /usr/lib64/python2.7/threading.py, line 704, in _reset_internal_locks (self=<Thread(_Thread__ident=139756474193664, _Thread__block=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1bea03d890>, _RLock__count=0) at remote 0x7f1be83d1e90>, acquire=<instancemethod at remote 0x7f1be83fa1e0>, _is_owned=<instancemethod at remote 0x7f1be83fa320>, _release_save=<instancemethod at remote 0x7f1be83fa280>, release=<instancemethod at remote 0x7f1be83fa230>, _acquire_restore=<instancemethod at remote 0x7f1be83fa2d0>, _Verbose__verbose=False, _Condition__waiters=[<thread.lock at remote 0x7f1be84d60b0>]) at remote 0x7f1be84ccb50>, _Thread__name='Thread-21', _Thread__daemonic=True, _state=1, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1be84d6150...(truncated)
    self.__started._reset_internal_locks()
#46 Frame 0x326ce60, for file /usr/lib64/python2.7/threading.py, line 1231, in _after_fork (new_active={}, current=<_MainThread(_Thread__ident=139758080321344, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f1bf0bc62b0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f1bf0bc62b0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f1bf0bc62b0>) at remote 0x7f1bf073b690>, _Thread__name='MainThread', _Thread__daemonic=False, _children=<WeakKeyDictionary(_remove=<function at remote 0x7f1beb56d8c0>, data={<weakref at remote 0x7f1be853f2b8>: None, <weakref at remote 0x7f1be853fc58>: None, <weakref at remote 0x7f1be853f8e8>: None, <weakref at remote 0x7f1be9ff5d60>: None, <weakref at remote 0x7f1be853fb50>: None, <weakref at remote 0x7f1be9ff53c0>: None, <weakref at remote 0x7f1be9ff5e10>: None, <weakref at remote 0x7f1be853fba8>: None, <weakref at remote 0x7f1be853f788>: None, <weakref at remote 0x7f1be9ff5cb0...(truncated)
    thread._reset_internal_locks()
---Type <return> to continue, or q <return> to quit--- 
#56 Frame 0x7f1bea932050, for file /usr/lib64/python2.7/multiprocessing/forking.py, line 121, in __init__ (self=<Popen(returncode=None) at remote 0x7f1be85e2810>, process_obj=<WorkerProcess(_play_context=<PlayContext(_squashed=False, _loader=None, _uuid='0242ac11-0006-bf35-c83a-00000000001f', _validated=False, _attributes={'prompt': '', 'debugger': None, 'vars': {}, 'ignore_unreachable': None, 'become_user': u'root', 'sudo_exe': u'sudo', 'su_exe': u'su', 'skip_tags': set([]), 'gather_subset': ['all'], 'module_compression': u'ZIP_DEFLATED', 'ignore_errors': None, 'diff': False, 'sftp_extra_args': None, 'port': None, 'become_flags': u'', 'force_handlers': False, 'su_flags': u'', 'no_log': None, 'executable': u'/bin/sh', 'remote_user': None, 'become_method': u'sudo', 'only_tags': set([u'all']), 'module_defaults': None, 'environment': None, 'ssh_executable': u'ssh', 'docker_extra_args': None, 'remote_addr': None, 'become_pass': None, 'connection_user': None, 'ssh_args': u'-C -o ControlMaster=auto -o ControlPersist=60...(truncated)
    self.pid = os.fork()
#67 Frame 0x7f1bea91fe50, for file /usr/lib64/python2.7/multiprocessing/process.py, line 130, in start (self=<WorkerProcess(_play_context=<PlayContext(_squashed=False, _loader=None, _uuid='0242ac11-0006-bf35-c83a-00000000001f', _validated=False, _attributes={'prompt': '', 'debugger': None, 'vars': {}, 'ignore_unreachable': None, 'become_user': u'root', 'sudo_exe': u'sudo', 'su_exe': u'su', 'skip_tags': set([]), 'gather_subset': ['all'], 'module_compression': u'ZIP_DEFLATED', 'ignore_errors': None, 'diff': False, 'sftp_extra_args': None, 'port': None, 'become_flags': u'', 'force_handlers': False, 'su_flags': u'', 'no_log': None, 'executable': u'/bin/sh', 'remote_user': None, 'become_method': u'sudo', 'only_tags': set([u'all']), 'module_defaults': None, 'environment': None, 'ssh_executable': u'ssh', 'docker_extra_args': None, 'remote_addr': None, 'become_pass': None, 'connection_user': None, 'ssh_args': u'-C -o ControlMaster=auto -o ControlPersist=60s', 'shell': None, 'ssh_extra_args': None, 'ssh_transfer_method': ...(truncated)
    self._popen = Popen(self)
#70 Frame 0x31063a0, for file /usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py, line 317, in _queue_task (self=<StrategyModule(_pending_results=0, noop_task=<Task(_squashed=False, _loader=<DataLoader(_tempfiles=set([]), _FILE_CACHE={u'/opt/apb/project/test.yml': <AnsibleSequence(_column_number=1, _data_source=u'/opt/apb/project/test.yml', _line_number=3) at remote 0x7f1bed927f70>, u'/opt/ansible/roles/automation-broker-apb/handlers/main.yml': <AnsibleSequence(_column_number=1, _data_source=u'/opt/ansible/roles/automation-broker-apb/handlers/main.yml', _line_number=2) at remote 0x7f1bea94b578>, u'/opt/apb/inventory/hosts': <AnsibleUnicode(_column_number=1, _data_source=u'/opt/apb/inventory/hosts', _line_number=1) at remote 0x7f1bed971fb0>, u'/opt/apb/env/extravars': {u'broker_image': u'ansibleplaybookbundle/origin-ansible-service-broker:canary', u'wait_for_broker': u'true', u'broker_auto_escalate': u'false', u'create_broker_namespace': u'true'}, u'/opt/ansible/roles/automation-broker-apb/defau...(truncated)
    worker_prc.start()
#74 Frame 0x2173970, for file /usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py, line 306, in run (self=<StrategyModule(_pending_results=0, noop_task=<Task(_squashed=False, _loader=<DataLoader(_tempfiles=set([]), _FILE_CACHE={u'/opt/apb/project/test.yml': <AnsibleSequence(_column_number=1, _data_source=u'/opt/apb/project/test.yml', _line_number=3) at remote 0x7f1bed927f70>, u'/opt/ansible/roles/automation-broker-apb/handlers/main.yml': <AnsibleSequence(_column_number=1, _data_source=u'/opt/ansible/roles/automation-broker-apb/handlers/main.yml', _line_number=2) at remote 0x7f1bea94b578>, u'/opt/apb/inventory/hosts': <AnsibleUnicode(_column_number=1, _data_source=u'/opt/apb/inventory/hosts', _line_number=1) at remote 0x7f1bed971fb0>, u'/opt/apb/env/extravars': {u'broker_image': u'ansibleplaybookbundle/origin-ansible-service-broker:canary', u'wait_for_broker': u'true', u'broker_auto_escalate': u'false', u'create_broker_namespace': u'true'}, u'/opt/ansible/roles/automation-broker-apb/defaults/main.y...(truncated)
    self._queue_task(host, task, task_vars, play_context)
#78 Frame 0x216fe60, for file /usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py, line 292, in run (self=<TaskQueueManager(_run_tree=False, _workers=[None], passwords={'conn_pass': None, 'become_pass': None}, _connection_lockfile=<file at remote 0x7f1bed943540>, _callbacks_loaded=True, _options=<Values(subset=None, ask_pass=False, listtags=None, become_user=u'root', sudo=False, private_key_file=None, syntax=None, skip_tags=[], diff=False, sftp_extra_args='', check=False, force_handlers=False, remote_user=None, become_method=u'sudo', listtasks=None, ask_su_pass=False, vault_ids=[], inventory=[u'/opt/apb/inventory'], vault_password_files=[], forks=5, listhosts=None, ssh_extra_args='', tags=[u'all'], become_ask_pass=False, start_at_task=None, flush_cache=None, step=None, become=False, su_user=None, ask_sudo_pass=False, extra_vars=[u'@/opt/apb/env/extravars'], verbosity=0, su=False, ssh_common_args='', connection=u'smart', ask_vault_pass=False, timeout=10, module_path=None, sudo_user=None, scp_ex...(truncated)
    play_return = strategy.run(iterator, play_context)
#82 Frame 0x221b710, for file /usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py, line 153, in run (self=<PlaybookExecutor(passwords={'conn_pass': None, 'become_pass': None}, _options=<Values(subset=None, ask_pass=False, listtags=None, become_user=u'root', sudo=False, private_key_file=None, syntax=None, skip_tags=[], diff=False, sftp_extra_args='', check=False, force_handlers=False, remote_user=None, become_method=u'sudo', listtasks=None, ask_su_pass=False, vault_ids=[], inventory=[u'/opt/apb/inventory'], vault_password_files=[], forks=5, listhosts=None, ssh_extra_args='', tags=[u'all'], become_ask_pass=False, start_at_task=None, flush_cache=None, step=None, become=False, su_user=None, ask_sudo_pass=False, extra_vars=[u'@/opt/apb/env/extravars'], verbosity=0, su=False, ssh_common_args='', connection=u'smart', ask_vault_pass=False, timeout=10, module_path=None, sudo_user=None, scp_extra_args='') at remote 0x7f1bed9b3950>, _loader=<DataLoader(_tempfiles=set([]), _FILE_CACHE={u'/opt/apb/project/t...(truncated)
    result = self._tqm.run(play=play)
#86 Frame 0x2201a80, for file /usr/lib/python2.7/site-packages/ansible/cli/playbook.py, line 117, in run (self=<PlaybookCLI(action=None, callback=None, args=[u'test.yml'], parser=<SortedOptParser(process_default_values=True, allow_interspersed_args=True, _long_opt={'--skip-tags': <Option(_long_opts=['--skip-tags'], help='only run plays and tasks whose tags do not match these values', callback_args=None, callback=None, default=[], nargs=1, choices=None, dest='skip_tags', container=<...>, _short_opts=[], action='append', const=None, callback_kwargs=None, type='string', metavar=None) at remote 0x7f1beda26ab8>, '--vault-id': <Option(_long_opts=['--vault-id'], help='the vault identity to use', callback_args=None, callback=None, default=[], nargs=1, choices=None, dest='vault_ids', container=<...>, _short_opts=[], action='append', const=None, callback_kwargs=None, type='string', metavar=None) at remote 0x7f1beda26a28>, '--flush-cache': <Option(_long_opts=['--flush-cache'], help='clear the fact cache for every host in in...(truncated)
    results = pbex.run()
#90 Frame 0x7f1bf6b2fa00, for file /usr/bin/ansible-playbook, line 110, in <module> ()
    exit_code = cli.run()
(gdb) 

This seems to always come back to:
#18 Frame 0x7f1bea98bc20, for file /usr/lib/python2.7/site-packages/kubernetes/client/api_client.py, line 78, in __del__ (self=<ApiClient(default_headers={'User-Agent': 'Swagger-Codegen/8.0.0/python'}, configuration=<Configuration(username='', ssl_ca_cert='/var/run/secrets/kubernetes.io/serviceaccount/ca.crt', cert_file=None, connection_pool_maxsize=40, verify_ssl=True, proxy=None, host='https://172.30.0.1:443', password='', key_file=None, assert_hostname=None, api_key_prefix={}, logger_stream_handler=<StreamHandler(stream=<file at remote 0x7f1bf6b981e0>, level=0, lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f1bea03d190>, _RLock__count=0) at remote 0x7f1be9ff3b50>, _name=None, filters=[], formatter=<Formatter(datefmt=None, _fmt='%(asctime)s %(levelname)s %(message)s') at remote 0x7f1be9ff3b10>) at remote 0x7f1be9ff3ad0>, temp_folder_path=None, _Configuration__debug=False, logger_formatter=<...>, _Configuration__logger_file=None, logger={'package_logger': <Logge...(truncated) self.pool.join()

I think we're seeing kubernetes-client/python#411

I'm 99% certain this is issue 411 from pthon-kubernetes now. Barring an unrelated error pulling an image from dockerhub we've now had several successful runs since I patched out async.

RUN sed -i '67d' /usr/lib/python2.7/site-packages/kubernetes/client/api_client.py
RUN sed -i '76d' /usr/lib/python2.7/site-packages/kubernetes/client/api_client.py
RUN sed -i 's/self.pool.join()/pass/' /usr/lib/python2.7/site-packages/kubernetes/client/api_client.py

👏 I am impressed.

I wonder if this failure in Travis is the same thing (since it would likely be using a canary apb-base)

https://travis-ci.org/openshift/ansible-service-broker/jobs/467704584#L2723

It seems likely.

The travis CI actually builds its own image for each run using the canary Dockerfile:
https://travis-ci.org/openshift/ansible-service-broker/jobs/467704584#L2049

If it were using canary it wouldn't be hanging since I pushed an image with async disabled that doesn't appear to fail anymore.

I suspect I'm confused because I read this as it is using the canary apb-base https://travis-ci.org/openshift/ansible-service-broker/jobs/467704584#L2051-L2055

Sorry, I misread your statement. Yes, you're correct. canary apb-base has this issue so building from it will result in an image with the issue unless async is disabled or a proper fix goes in.

If it's sufficiently problematic we could add the sed commands to apb-base:canary and it will prevent the issue where it seems most prevalent. I have no witnessed any issues with async disabled, but that doesn't mean any couldn't arise.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/close

@jmrodri: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.