Unexpected failure during module execution: ... Read timed out. (read timeout=30)
module0x90 opened this issue · 4 comments
What is the issue ?
We were previously running ara on a RH7 in a container, latest version. Everything was working ok and ara recorded
every single playbook run we were doing, concurrent ones etc.
Now we've moved Ansible and ara to latest RH8 and we decided to run Ansible locally as a service. As part of the migration I simply copied ansible.sqlite to the new system configured it and started the service. All was good, no errors, no issues and we were able to access previous Ansible runs just before just fine.
That new systems is a RH8, running sqlite-libs-3.26.0-17.el8_7.x86_64 and ara 1.6.1. I remember the container version
was also on 1.6.1. We cannot run the container anymore due to other changes on the original system, it will fail by default.
Now we realized that when we run our playbook against a number of systems (or just 2 systems in parallel) we get always this error:
TASK [Record who is running the playbook] *********************************************************************************************************************************** task path: /ansibleproduction/ara.yml:30 ok: [servername.fully.qualified] => { "changed": false, "created": "2023-07-28T08:16:50.993098Z", "key": "runasuser", "msg": "Record created or updated in ARA", "playbook_id": 3187, "type": "text", "updated": "2023-07-28T08:16:50.993175Z", "value": "localuser" } The full traceback is: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 536, in _make_request response = conn.getresponse() ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 461, in getresponse httplib_response = super().getresponse() ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/http/client.py", line 1374, in getresponse response.begin() File "/usr/lib64/python3.11/http/client.py", line 318, in begin version, status, reason = self._read_status() ^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/http/client.py", line 279, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/socket.py", line 706, in readinto return self._sock.recv_into(b) ^^^^^^^^^^^^^^^^^^^^^^^ TimeoutError: timed out The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 844, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 470, in increment raise reraise(type(error), error, _stacktrace) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/util/util.py", line 39, in reraise raise value File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 538, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 370, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='ansibleproduction.fully.qualified', port=8000): Read timed out. (read timeout=30) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/ansible/executor/task_executor.py", line 158, in run res = self._execute() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ansible/executor/task_executor.py", line 629, in _execute result = self._handler.run(task_vars=vars_copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ara/plugins/action/ara_record.py", line 184, in run play = self.client.get("/api/v1/plays?uuid=%s" % parent._parent._play._uuid) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 99, in get return self._request("get", endpoint, params=kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 83, in _request response = func(url, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 48, in get return self._request("get", url, **payload) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 44, in _request return self.http.request(method, self.endpoint + url, timeout=self.timeout, **payload) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 532, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPConnectionPool(host='ansibleproduction.fully.qualified', port=8000): Read timed out. (read timeout=30) fatal: [servername.fully.qualified]: FAILED! => { "msg": "Unexpected failure during module execution: HTTPConnectionPool(host='ansibleproduction.fully.qualified', port=8000): Read timed out. (read timeout=30)", "stdout": "" }
What should be happening ?
Please advise on the error above what the reason and a potential workaround.
The server has been running continuously but is now not responding for some reason on concurrent runs.
The only thing what has changed are the servers (and ara configuration most likely). But didn't had suddenly a
requirement to run Ansible against 100s (or 1000s) of systems. Meaning The sqlite db had only very few records added
after the migration and the first proper run, where Ansible only ran against ~20 systems.
Hi and thanks for the issue @module0x90.
At first glance, I think this could be a manifestation of a known (but unsolved) issue of using ara_record across more than one host: #378
Could you find out if you are able to reproduce the issue using, for example, run_once
?
Thank you, @dmsimard.
I checked and that's indeed the only task in our ara "preamble" which does not have run_once (and delegate_to localhost).
It is just a bit strange that this issue just happened after we migrated to a new system. This part of our playbooks rarely
changes.
We will implement this, test and report back whether this addition makes the error go away.
Hi, sorry for my delayed response.
We've implemented that as suggested (adding run_once and delegate_to) and then had it ran against multiple systems in parallel. Worked great, error has gone away.
Thanks, @dmsimard.
Closing.
Thanks for the update @module0x90 !