ansible-community/ara

Unexpected failure during module execution: ... Read timed out. (read timeout=30)

module0x90 opened this issue · 4 comments

What is the issue ?

We were previously running ara on a RH7 in a container, latest version. Everything was working ok and ara recorded
every single playbook run we were doing, concurrent ones etc.
Now we've moved Ansible and ara to latest RH8 and we decided to run Ansible locally as a service. As part of the migration I simply copied ansible.sqlite to the new system configured it and started the service. All was good, no errors, no issues and we were able to access previous Ansible runs just before just fine.

That new systems is a RH8, running sqlite-libs-3.26.0-17.el8_7.x86_64 and ara 1.6.1. I remember the container version
was also on 1.6.1. We cannot run the container anymore due to other changes on the original system, it will fail by default.

Now we realized that when we run our playbook against a number of systems (or just 2 systems in parallel) we get always this error:

TASK [Record who is running the playbook] ***********************************************************************************************************************************
task path: /ansibleproduction/ara.yml:30
ok: [servername.fully.qualified] => {
    "changed": false,
    "created": "2023-07-28T08:16:50.993098Z",
    "key": "runasuser",
    "msg": "Record created or updated in ARA",
    "playbook_id": 3187,
    "type": "text",
    "updated": "2023-07-28T08:16:50.993175Z",
    "value": "localuser"
}
The full traceback is:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 461, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/lib64/python3.11/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/socket.py", line 706, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 844, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 470, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 538, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 370, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='ansibleproduction.fully.qualified', port=8000): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ansible/executor/task_executor.py", line 158, in run
   res = self._execute()
          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ansible/executor/task_executor.py", line 629, in _execute
    result = self._handler.run(task_vars=vars_copy)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ara/plugins/action/ara_record.py", line 184, in run
    play = self.client.get("/api/v1/plays?uuid=%s" % parent._parent._play._uuid)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 99, in get
    return self._request("get", endpoint, params=kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 83, in _request
    response = func(url, **kwargs)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 48, in get
    return self._request("get", url, **payload)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ara/clients/http.py", line 44, in _request
    return self.http.request(method, self.endpoint + url, timeout=self.timeout, **payload)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='ansibleproduction.fully.qualified', port=8000): Read timed out. (read timeout=30)
fatal: [servername.fully.qualified]: FAILED! => {
    "msg": "Unexpected failure during module execution: HTTPConnectionPool(host='ansibleproduction.fully.qualified', port=8000): Read timed out. (read timeout=30)",
    "stdout": ""
}

What should be happening ?

Please advise on the error above what the reason and a potential workaround.
The server has been running continuously but is now not responding for some reason on concurrent runs.
The only thing what has changed are the servers (and ara configuration most likely). But didn't had suddenly a
requirement to run Ansible against 100s (or 1000s) of systems. Meaning The sqlite db had only very few records added
after the migration and the first proper run, where Ansible only ran against ~20 systems.

Hi and thanks for the issue @module0x90.

At first glance, I think this could be a manifestation of a known (but unsolved) issue of using ara_record across more than one host: #378

Could you find out if you are able to reproduce the issue using, for example, run_once ?

Thank you, @dmsimard.
I checked and that's indeed the only task in our ara "preamble" which does not have run_once (and delegate_to localhost).

It is just a bit strange that this issue just happened after we migrated to a new system. This part of our playbooks rarely
changes.

We will implement this, test and report back whether this addition makes the error go away.

Hi, sorry for my delayed response.

We've implemented that as suggested (adding run_once and delegate_to) and then had it ran against multiple systems in parallel. Worked great, error has gone away.

Thanks, @dmsimard.
Closing.

Thanks for the update @module0x90 !