ansible/event-driven-ansible

url_check.py cannot deal with unavailable sites

mglantz opened this issue · 5 comments

If a website is properly down the url_check.py plugin will bail out as shown below:

[mglantz@darkred event-driven-ansible]$ ansible-rulebook --rulebook website-automation.yml -i inventory --verbose
INFO:ansible_rulebook.app:Starting sources
INFO:ansible_rulebook.app:Starting rules
INFO:ansible_rulebook.engine:run_ruleset
INFO:ansible_rulebook.engine:ruleset define: {"name": "Listen for events on a webhook", "hosts": ["all"], "sources": [{"EventSource": {"name": "ansible.eda.url_check", "source_name": "ansible.eda.url_check", "source_args": {"urls": ["http://rhel9apache.sudo.net"], "delay": 10}, "source_filters": []}}], "rules": [{"Rule": {"name": "Web site is up", "condition": {"AllCondition": [{"EqualsExpression": {"lhs": {"Event": "url_check.status"}, "rhs": {"String": "up"}}}]}, "action": {"Action": {"action": "run_playbook", "action_args": {"name": "site_up.yml"}}}, "enabled": true}}, {"Rule": {"name": "Web site is down", "condition": {"AllCondition": [{"EqualsExpression": {"lhs": {"Event": "url_check.status"}, "rhs": {"String": "down"}}}]}, "action": {"Action": {"action": "run_playbook", "action_args": {"name": "site_down.yml"}}}, "enabled": true}}]}
INFO:ansible_rulebook.engine:load source
INFO:ansible_rulebook.engine:load source filters
INFO:ansible_rulebook.engine:Calling main in ansible.eda.url_check
INFO:ansible_rulebook.engine:Waiting for event from Listen for events on a webhook
ERROR:ansible_rulebook.engine:Source error
Traceback (most recent call last):
  File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 986, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "/usr/lib64/python3.10/asyncio/base_events.py", line 1064, in create_connection
    raise exceptions[0]
  File "/usr/lib64/python3.10/asyncio/base_events.py", line 1049, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib64/python3.10/asyncio/base_events.py", line 960, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib64/python3.10/asyncio/selector_events.py", line 500, in sock_connect
    return await fut
  File "/usr/lib64/python3.10/asyncio/selector_events.py", line 535, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('192.168.120.130', 80)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ansible_rulebook/engine.py", line 150, in start_source
    await entrypoint(fqueue, args)
  File "/home/mglantz/.ansible/collections/ansible_collections/ansible/eda/plugins/event_source/url_check.py", line 38, in main
    async with session.get(url) as resp:
  File "/usr/lib64/python3.10/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/usr/lib64/python3.10/site-packages/aiohttp/client.py", line 535, in _request
    conn = await self._connector.connect(
  File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 542, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 907, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
    raise last_exc
  File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host rhel9apache.sudo.net:80 ssl:default [Connect call failed ('192.168.120.130', 80)]
INFO:ansible_rulebook.engine:Canceling all ruleset tasks
INFO:ansible_rulebook.app:Cancelling event source tasks
INFO:ansible_rulebook.app:Main complete

I'm not sure if this is meant to be, but it can be fixed by having some error handling for the aiohttp call.

Well, actually, the plugin cannot deal with any ClientError related issues, including SSL errors, etc.

I Agree. IMO this plugin can be improved adding the following capabilities:

  1. Consider networking issues as errors or events
  2. Certificates verification
  3. Custom retries and timeouts
  4. Expose the body response in the event data
  5. Follow redirects

I Agree. IMO this plugin can be improved adding the following capabilities:

  1. Consider networking issues as errors or events
  2. Certificates verification
  3. Custom retries and timeouts
  4. Expose the body response in the event data
  5. Follow redirects

AFAIKS, this fix would handle 1. and 2. and timeouts. Even though it doesn't make a distinction between an SSL error, a timeout and a connection failure. It's all thrown into ClientError. I note that in aio-libs/aiohttp#4064 there seems that there are discussions around dealing with specific exceptions in aiohttp, so perhaps not dealing with specific error types saves some work, if aiohttp is to be used going forward.

Something which struck me was that the event structure returned doesn't fit that well into dealing with connection related issues.

                 dict(
                    url_check=dict(
                        url=url,
                        status="down",
                        status_code=404,
                    )
                )
            )

Perhaps something like below instead would work better.

                 dict(
                    url_check=dict(
                        url=url,
                        status="down",
                        error_type="http|ssl|tcp"
                    )

In the suggested fix, I set status_code to 500, indicating internal server error, but that is IMHO forcing the shoe to fit.

I tested the initial patch with a firewall dropping traffic and the process being shut down and url_check now doesn't bail out, so "down" can trigger playbook which fixes firewalls and the web server process. I can imagine that you can use this to mitigate DDoS attacks or etc as well.

Closing as fix has been merged.